我正在尝试从这篇论文中实现 Yolo(论文没有提到它是 v1,但它是第一篇论文,所以我认为它是 v1 )。我正在使用 Keras 和 Tensorflow 1.x 在 Google Colab 上实现。
TLDR ; 结果:
开始时代:
Iteration, 0
Train on 1800 samples, validate on 450 samples
Epoch 1/32
1800/1800 [==============================] - 13s 7ms/step - loss: 541.8767 - mean_iou_metric: 0.0040 - val_loss: 361.9846 - val_mean_iou_metric: 0.0043
Epoch 2/32
1800/1800 [==============================] - 11s 6ms/step - loss: 378.6184 - mean_iou_metric: 0.0042 - val_loss: 330.4124 - val_mean_iou_metric: 0.0043
结束 Epochs(总共 320 个 epoch,每个循环 32 个):
Epoch 31/32
1800/1800 [==============================] - 11s 6ms/step - loss: 240.4603 - mean_iou_metric: 0.0038 - val_loss: 350.3984 - val_mean_iou_metric: 0.0042
Epoch 32/32
1800/1800 [==============================] - 11s 6ms/step - loss: 240.2410 - mean_iou_metric: 0.0038 - val_loss: 349.5258 - val_mean_iou_metric: 0.0042
问题:即使经过这么多 epoch,损失也减少得很少(但减少了,这没关系),但 mean_iou 并没有增加,这让我很担心。为什么会发生这种情况?在这个阶段,我无法调试为什么 iou 没有增加,尽管损失减少了。这种损失值是自然的吗?这对我来说并不自然,因此我们将不胜感激。我怀疑我在损失函数实现中做错了什么。
数据集:我使用的数据集大小为 2500*256*256*3,由白色背景和 3 种彩色形状(矩形、三角形和圆形)组成。最多可以没有形状或 3 个形状。形状可以都是相同的或如上所述的不同类型。它可以从这里使用 python 文件生成。示例图片:
参数说明:根据论文,我将S(所以SxS网格),B(每个网格的边界框数)和C(类数)设置如下:
N=len(labels)
print("No of images, ",N)
# No of bounding boxes per grid, B
B=1
# No of grids,S*S
S=16
# No. of classes, C
C=3 #3 for 3 types of shapes
# Output=SxSx(5B+C)
I_S=256 # Image dimension I_SxI_S
classes={'circle':0,'triangle':1,'rectangle':2}
lenClasses=len(classes)
#print(lenClasses)
norm_const=I_S/S
请注意,我已经定义了一个名为norm_const的常量,它将用于标准化地面实况中心坐标、高度和宽度,它们存在于 0-255 范围内。
我如何规范化图像、中心坐标、高度和宽度: Ground Truth 是一个 JSON 结构,其中一个图像中每个形状的边界框的 x1、x2、y1、y2 坐标。我正在计算中心、高度和宽度值并将它们标准化。我对每个网格的最终向量是 [1,cx,cy,h,w,0,0,1] 其中最后三个值是分类分数,第一个值是置信度分数,其余的是坐标。如果网格没有中心,则其向量自动为 [0,0,0,0,0,0,0,0](来自 numpy zeros 定义)。
x1='x1'
x2='x2'
y1='y1'
y2='y2'
boxes='boxes'
classVar='class'
for box in labels[i][boxes]:
cx1,cx2=box[x1],box[x2]
cy1,cy2=box[y1],box[y2]
# Design one hot vector, [0]*3 gives [0,0,0]
onehot=[0]*lenClasses
onehot[classes[box[classVar]]]=1 # box[classVar] gives string 'className', which is fed into classes dictionary, which gives position (0/1/2)
# Centers and h,w
cx,cy,h,w=(cx1+cx2)/2.0,(cy1+cy2)/2.0,np.abs(cy2-cy1),np.abs(cx2-cx1)
# Now, to compute where in a grid of SxS, the center would lie
posx,posy=int((cx*S)/I_S),int((cy*S)/I_S)
# NORMALIZE h,w
h=h/I_S # I_S is image size
w=w/I_S
# NORMALIZE cx,cy
cx=(cx-posx*norm_const)/norm_const
cy=(cy-posy*norm_const)/norm_const
# RESTORING cx,cy from normalized values
#cxo=(cx+posx)*norm_const
#cyo=(cy+posy)*norm_const
gt[image_number,posx,posy]=1,cx,cy,h,w,*onehot # gt is defined as np.zeros((N,S,S,5+lenClasses))
损失函数实现:
from keras import backend as K
coord=10 # We want loss from coordinates to have more weightage
noobj=0.1
# A simple loss
# How output is arranged-> B confidence values, 4B normalized coordinates, one hot vector
def yolo_loss_trial(y_true,y_pred):
localizationLoss=0.0
classificationLoss=0.0
confidenceLoss=0.0
batchsize_as_tensor_obj=tf.shape(y_pred)[0]
object_presence=tf.reshape(y_true[:,:,:,0],shape=[batchsize_as_tensor_obj,S,S,1]) #from batchx16x16 to batchx16x16x1
# CLASSIFICATION LOSS
# batch x S x S x 1 * batch x S x S x 3, allowed
classificationLoss=K.sum(K.square((object_presence*y_true[:,:,:,5:8])-y_pred[:,:,:,5*B:5*B+C]))
# LOCALIZATION LOSS
for i in range(B,5*B,4):
# batch x S x S x 1 * batch x S x S x 2, allowed
localizationLoss=localizationLoss+(K.sum(K.square((object_presence*y_true[:,:,:,1:3])-y_pred[:,:,:,i:i+2])))
localizationLoss=localizationLoss+(K.sum(K.square(K.sqrt(object_presence*y_true[:,:,:,3:5])-K.sqrt(y_pred[:,:,:,i+2:i+4]))))
localizationLoss=localizationLoss*coord
# CONFIDENCE LOSS
for i in range(0,B):
y_iou=return_iou_tensor(y_true,y_pred,i) # batch x S x S
# take 1 from 1,noobj and noobj from noobj,0
object_presence_modified=tf.math.maximum(y_true[:,:,:,0],noobj) # batch x S x S
confidenceLoss=confidenceLoss+(K.sum(K.square((object_presence_modified*y_true[:,:,:,0]*y_iou)-y_pred[:,:,:,i]))) # batch x S x S ops
return localizationLoss+classificationLoss+confidenceLoss
IoU (Intersection over Union) 实现置信度得分:灵感来自这里,IoU 实现如下:
import tensorflow as tf
from keras import backend as K
# Creating INDICES tensor to add to normalized centers
indices=np.reshape(np.arange(S),[1,S]) # consists of 0 to S-1, i.e., indices.
indices_tensor_Y=tf.constant(indices,dtype=float) # 1x S
indices_tensor_Y=tf.repeat(indices_tensor_Y,repeats=[S],axis=0) # S x S, 0123S;0123S;0123S S rows
indices_tensor_X=tf.transpose(indices_tensor_Y) # S x S
indices_tensor_Y=tf.reshape(indices_tensor_Y,[1,S,S]) # 1 x S x S
indices_tensor_X=tf.reshape(indices_tensor_X,[1,S,S]) # 1 x S x S
#indices_tensor=tf.repeat(indices_tensor,repeats=[batch_tensor],axis=0) # batch x S x S
# repeat() will repeat axis-0 (SxS), batch_tensor number of times along the channel
# IOU Calculation between two bounding boxes
def return_iou_tensor(box_true,box_pred,i):
'''
box_true=batch x S x S x 8
box_pred=batch x S x S x (5B+C)
'''
# Restored gt
cx_restored_gt_tensor=norm_const*(indices_tensor_X+box_true[:,:,:,2]) # 1 x S x S + batch x S x S = batch x S x S
cy_restored_gt_tensor=norm_const*(indices_tensor_Y+box_true[:,:,:,3]) # 1 x S x S + batch x S x S = batch x S x S
h_restored_gt_tensor=box_true[:,:,:,4]*I_S # batch x S x S
w_restored_gt_tensor=box_true[:,:,:,5]*I_S # batch x S x S
# Restored predicted
cx_restored_pred_tensor=norm_const*(indices_tensor_X+box_pred[:,:,:,B+4*i]) # 1 x S x S + batch x S x S = batch x S x S
cx_restored_pred_tensor=tf.math.maximum(cx_restored_pred_tensor,0)# To remove negative values
cy_restored_pred_tensor=norm_const*(indices_tensor_Y+box_pred[:,:,:,B+1+4*i]) # 1 x S x S + batch x S x S = batch x S x S
cy_restored_pred_tensor=tf.math.maximum(cy_restored_pred_tensor,0)# To remove negative values
h_restored_pred_tensor=box_pred[:,:,:,B+2+4*i]*I_S # batch x S x S
h_restored_pred_tensor=tf.math.maximum(h_restored_pred_tensor,0)# To remove negative values
w_restored_pred_tensor=box_pred[:,:,:,B+3+4*i]*I_S # batch x S x S
w_restored_pred_tensor=tf.math.maximum(w_restored_pred_tensor,0)# To remove negative values
# min max of intersection box all, batch x S x S
x_min_tensor=tf.math.maximum(tf.math.maximum(cx_restored_gt_tensor-w_restored_gt_tensor/2,0),tf.math.maximum(cx_restored_pred_tensor-w_restored_pred_tensor/2,0))
y_min_tensor=tf.math.maximum(tf.math.maximum(cy_restored_gt_tensor-h_restored_gt_tensor/2,0),tf.math.maximum(cy_restored_pred_tensor-h_restored_pred_tensor/2,0))
x_max_tensor=tf.math.minimum(cx_restored_gt_tensor+w_restored_gt_tensor/2,cx_restored_pred_tensor+w_restored_pred_tensor/2)
y_max_tensor=tf.math.minimum(cy_restored_gt_tensor+h_restored_gt_tensor/2,cy_restored_pred_tensor+h_restored_pred_tensor/2)
w_intersection=tf.math.maximum(x_max_tensor-x_min_tensor,0)
h_intersection=tf.math.maximum(y_max_tensor-y_min_tensor,0)
intersection_tensor=w_intersection*h_intersection # batch x S x S
union_tensor=(w_restored_gt_tensor*h_restored_gt_tensor)+(w_restored_pred_tensor*h_restored_pred_tensor) # batch x S x S
smooth=1 # We are using smooth because we dont want division by 0
return (intersection_tensor+smooth)/(union_tensor+smooth) #batch x S x S
以及在训练期间要观察的 mean_iou_metric:
def mean_iou_metric(y_true,y_pred):
mean_iou=0.0
for i in range(0,B):
iou_tensor=y_true[:,:,:,0]*return_iou_tensor(y_true,y_pred,i)
mean_iou=mean_iou+K.mean(iou_tensor)
return mean_iou/B
模型(基于faster yolo):
TLDR;图片:
from keras import backend as K
import tensorflow as tf
def custom_activation(x):
# LEAKY RELU
isPositive=K.cast(K.greater(x,0),K.floatx()) # U HAVE TO CAST THE OUTPUT OF COMAPARISION TO FLOAT, BOOL NOT ACCEPTED
# OUTPUT OF THIS FUNCTION IS A TENSOR
return (isPositive*x)+(1-isPositive)*0.1*x
############### BLOCK 1 ##############################
input_=Input(shape=(256,256,3),name='input')
#zeropad1=ZeroPadding2D(padding=(3,3))(input_) # PADDING MAKES 448->3+448+3, it is required to bring output to 112
convLayer1=Conv2D(64,(7,7),strides=(2,2),padding='valid',activation=custom_activation,name='conv_layer1')(input_)
maxpoolLayer1=MaxPooling2D(pool_size=(2,2),name='max_pool_layer1')(convLayer1)
#zeropad2=ZeroPadding2D(padding=(1,1))(maxpoolLayer1)
########################################################
############### BLOCK 2 ##############################
convLayer2=Conv2D(192,(3,3),padding='valid',activation=custom_activation,name='conv_layer2')(maxpoolLayer1)
maxpoolLayer2=MaxPooling2D(pool_size=(2,2),name='max_pool_layer2')(convLayer2)
#zeropad3=ZeroPadding2D(padding=(2,2))(maxpoolLayer2)
########################################################
############### BLOCK 3 ##############################
convLayer3=Conv2D(128,(1,1),padding='valid',activation=custom_activation,name='conv_layer3')(maxpoolLayer2)
convLayer4=Conv2D(256,(3,3),padding='valid',activation=custom_activation,name='conv_layer4')(convLayer3)
#convLayer5=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer5')(convLayer4)
#convLayer6=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer6')(convLayer5)
maxpoolLayer3=MaxPooling2D(pool_size=(2,2),name='max_pool_layer3')(convLayer4)
#zeropad4=ZeroPadding2D(padding=(5,5))(maxpoolLayer3)
########################################################
############### BLOCK 4 ##############################
convLayer7=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer7')(maxpoolLayer3)
convLayer8=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer8')(convLayer7)
#convLayer9=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer9')(convLayer8)
#convLayer10=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer10')(convLayer9)
#convLayer11=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer11')(convLayer10)
#convLayer12=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer12')(convLayer11)
#convLayer13=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer13')(convLayer12)
#convLayer14=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer14')(convLayer13)
#convLayer15=Conv2D(512,(1,1),padding='valid',activation=custom_activation,name='conv_layer15')(convLayer14)
#convLayer16=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer16')(convLayer15)
maxpoolLayer4=MaxPooling2D(pool_size=(2,2),name='max_pool_layer4')(convLayer8)
#zeropad5=ZeroPadding2D(padding=(4,4))(maxpoolLayer4)
###########################################################
############### BLOCK 5 ##################################
convLayer17=Conv2D(512,(1,1),padding='valid',activation=custom_activation,name='conv_layer17')(maxpoolLayer4)
convLayer18=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer18')(convLayer17)
#convLayer19=Conv2D(512,(1,1),padding='valid',activation=custom_activation,name='conv_layer19')(convLayer18)
#convLayer20=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer20')(convLayer19)
#convLayer21=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer21')(convLayer20)
#convLayer22=Conv2D(1024,(3,3),strides=(2,2),padding='valid',activation=custom_activation,name='conv_layer22')(convLayer21)
#zeropad6=ZeroPadding2D(padding=(2,2))(convLayer18)
#############################################################
################ BLOCK 6 ####################################
convLayer23=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer23')(convLayer18)
#convLayer24=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer24')(convLayer23)
flattenedLayer1=Flatten()(convLayer23) # Flatten just converts 3d matrix to 1d so that it can be connected to a next Dense Layer
###############################################################
################ BLOCK 7 #########################################
denseLayer1=Dense(units=4096,activation=custom_activation)(flattenedLayer1)
##################################################################
################ BLOCK 8 #########################################
denseLayer2=Dense(units=S*S*(5*B+C),activation='linear')(denseLayer1)
output_=Reshape((S,S,(5*B+C)))(denseLayer2) # Reshapes the 1D to 3D
##################################################################
fast_model=Model(inputs=input_,outputs=output_)
fast_model.summary()
from keras.utils import plot_model
plot_model(fast_model,to_file='unet.png',show_shapes=True)
最新训练参数:
model=fast_model # SELECT MODEL
model.save_weights('weights.hdf5')
model.compile(optimizer=Adam(learning_rate=1e-5),loss=yolo_loss_trial,metrics=[mean_iou_metric])
#model.compile(optimizer=SGD(learning_rate=1e-5),loss=yolo_loss_trial,metrics=[mean_iou_metric])
model.load_weights('weights.hdf5')
checkpointer = callbacks.ModelCheckpoint(filepath = 'weights.hdf5',save_best_only=True)
training_log = callbacks.TensorBoard(log_dir='./Model_logs')
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.2,patience=3, min_lr=1e-5,mode='auto') # ADD IN CALLBACK in fit()
# patience is after how many epochs if improvement is not seen, then reduce lr, newlr=lr*factor
for i in range(0,10):
print("Iteration, ",i)
history=model.fit(X_train,Y_train,validation_data=(X_val,Y_val),batch_size=16,epochs=32,callbacks=[training_log,checkpointer,reduce_lr],shuffle=True)
# SAVE MODEL TO DRIVE
!cp '/content/weights.hdf5' 'gdrive/My Drive/Colab Notebooks/Colab Datasets/Breast_Cancer_HNS/Images'
# CONFIRM EXECUTION TIMESTAMP
from datetime import datetime
import pytz
tz = pytz.timezone('Asia/Calcutta')
berlin_now = datetime.now(tz)
dt_string = berlin_now.strftime("%d/%m/%Y %H:%M:%S")
print(dt_string)
#from google.colab import output
#output.eval_js('new Audio("https://ssl.gstatic.com/dictionary/static/sounds/20180430/complete--_us_1.mp3").play()')
# SAVE WHOLE MODEL TO LOCAL/COLAB DRIVE
model.save("FastYolo") #Saves weights also according to official docs
# SAVE MODEL TO GOOGLE DRIVE
!cp '/content/FastYolo' '/content/gdrive/My Drive/Colab Notebooks/Colab Datasets/Shape_Detection_YOLO/None2500NoOverlap'