python - 为什么 YOLO 训练损失没有显着减少且平均 IoU 没有增加？

Question

我正在尝试从这篇论文中实现 Yolo（论文没有提到它是 v1，但它是第一篇论文，所以我认为它是 v1 ）。我正在使用 Keras 和 Tensorflow 1.x 在 Google Colab 上实现。

TLDR ; 结果：

开始时代：

Iteration,  0
Train on 1800 samples, validate on 450 samples
Epoch 1/32
1800/1800 [==============================] - 13s 7ms/step - loss: 541.8767 - mean_iou_metric: 0.0040 - val_loss: 361.9846 - val_mean_iou_metric: 0.0043
Epoch 2/32
1800/1800 [==============================] - 11s 6ms/step - loss: 378.6184 - mean_iou_metric: 0.0042 - val_loss: 330.4124 - val_mean_iou_metric: 0.0043

结束 Epochs（总共 320 个 epoch，每个循环 32 个）：

Epoch 31/32
1800/1800 [==============================] - 11s 6ms/step - loss: 240.4603 - mean_iou_metric: 0.0038 - val_loss: 350.3984 - val_mean_iou_metric: 0.0042
Epoch 32/32
1800/1800 [==============================] - 11s 6ms/step - loss: 240.2410 - mean_iou_metric: 0.0038 - val_loss: 349.5258 - val_mean_iou_metric: 0.0042

问题：即使经过这么多 epoch，损失也减少得很少（但减少了，这没关系），但 mean_iou 并没有增加，这让我很担心。为什么会发生这种情况？在这个阶段，我无法调试为什么 iou 没有增加，尽管损失减少了。这种损失值是自然的吗？这对我来说并不自然，因此我们将不胜感激。我怀疑我在损失函数实现中做错了什么。

数据集：我使用的数据集大小为 2500*256*256*3，由白色背景和 3 种彩色形状（矩形、三角形和圆形）组成。最多可以没有形状或 3 个形状。形状可以都是相同的或如上所述的不同类型。它可以从这里使用 python 文件生成。示例图片：

参数说明：根据论文，我将S（所以SxS网格），B（每个网格的边界框数）和C（类数）设置如下：

N=len(labels)
print("No of images, ",N)
# No of bounding boxes per grid, B
B=1
# No of grids,S*S
S=16
# No. of classes, C
C=3 #3 for 3 types of shapes
# Output=SxSx(5B+C)
I_S=256 # Image dimension I_SxI_S
classes={'circle':0,'triangle':1,'rectangle':2}
lenClasses=len(classes)
#print(lenClasses)
norm_const=I_S/S

请注意，我已经定义了一个名为norm_const的常量，它将用于标准化地面实况中心坐标、高度和宽度，它们存在于 0-255 范围内。

我如何规范化图像、中心坐标、高度和宽度： Ground Truth 是一个 JSON 结构，其中一个图像中每个形状的边界框的 x1、x2、y1、y2 坐标。我正在计算中心、高度和宽度值并将它们标准化。我对每个网格的最终向量是 [1,cx,cy,h,w,0,0,1] 其中最后三个值是分类分数，第一个值是置信度分数，其余的是坐标。如果网格没有中心，则其向量自动为 [0,0,0,0,0,0,0,0]（来自 numpy zeros 定义）。

x1='x1'
x2='x2'
y1='y1'
y2='y2'
boxes='boxes'
classVar='class'
for box in labels[i][boxes]:
    cx1,cx2=box[x1],box[x2]
    cy1,cy2=box[y1],box[y2]
    # Design one hot vector, [0]*3 gives [0,0,0]
    onehot=[0]*lenClasses
    onehot[classes[box[classVar]]]=1 # box[classVar] gives string 'className', which is fed into classes dictionary, which gives position (0/1/2)
    # Centers and h,w
    cx,cy,h,w=(cx1+cx2)/2.0,(cy1+cy2)/2.0,np.abs(cy2-cy1),np.abs(cx2-cx1)
    # Now, to compute where in a grid of SxS, the center would lie
    posx,posy=int((cx*S)/I_S),int((cy*S)/I_S)
    
    # NORMALIZE h,w
    h=h/I_S # I_S is image size
    w=w/I_S
    # NORMALIZE cx,cy
    cx=(cx-posx*norm_const)/norm_const
    cy=(cy-posy*norm_const)/norm_const
    # RESTORING cx,cy from normalized values
    #cxo=(cx+posx)*norm_const
    #cyo=(cy+posy)*norm_const
    gt[image_number,posx,posy]=1,cx,cy,h,w,*onehot # gt is defined as np.zeros((N,S,S,5+lenClasses))

损失函数实现：

from keras import backend as K
coord=10  # We want loss from coordinates to have more weightage
noobj=0.1
# A simple loss
# How output is arranged-> B confidence values, 4B normalized coordinates, one hot vector
def yolo_loss_trial(y_true,y_pred):
  localizationLoss=0.0
  classificationLoss=0.0
  confidenceLoss=0.0
  batchsize_as_tensor_obj=tf.shape(y_pred)[0]
  object_presence=tf.reshape(y_true[:,:,:,0],shape=[batchsize_as_tensor_obj,S,S,1]) #from batchx16x16 to batchx16x16x1

  # CLASSIFICATION LOSS
  # batch x S x S x 1 * batch x S x S x 3, allowed
  classificationLoss=K.sum(K.square((object_presence*y_true[:,:,:,5:8])-y_pred[:,:,:,5*B:5*B+C]))

  # LOCALIZATION LOSS
  for i in range(B,5*B,4):
    # batch x S x S x 1 * batch x S x S x 2, allowed
    localizationLoss=localizationLoss+(K.sum(K.square((object_presence*y_true[:,:,:,1:3])-y_pred[:,:,:,i:i+2])))
    localizationLoss=localizationLoss+(K.sum(K.square(K.sqrt(object_presence*y_true[:,:,:,3:5])-K.sqrt(y_pred[:,:,:,i+2:i+4]))))
  localizationLoss=localizationLoss*coord

  # CONFIDENCE LOSS
  for i in range(0,B):
    y_iou=return_iou_tensor(y_true,y_pred,i) #  batch x S x S
    # take 1 from 1,noobj and noobj from noobj,0
    object_presence_modified=tf.math.maximum(y_true[:,:,:,0],noobj) # batch x S x S
    confidenceLoss=confidenceLoss+(K.sum(K.square((object_presence_modified*y_true[:,:,:,0]*y_iou)-y_pred[:,:,:,i]))) # batch x S x S ops

  return localizationLoss+classificationLoss+confidenceLoss

IoU (Intersection over Union) 实现置信度得分：灵感来自这里，IoU 实现如下：

import tensorflow as tf
from keras import backend as K

# Creating INDICES tensor to add to normalized centers
indices=np.reshape(np.arange(S),[1,S]) # consists of 0 to S-1, i.e., indices.
indices_tensor_Y=tf.constant(indices,dtype=float) # 1x S
indices_tensor_Y=tf.repeat(indices_tensor_Y,repeats=[S],axis=0) # S x S, 0123S;0123S;0123S S rows
indices_tensor_X=tf.transpose(indices_tensor_Y) # S x S
indices_tensor_Y=tf.reshape(indices_tensor_Y,[1,S,S]) # 1 x S x S
indices_tensor_X=tf.reshape(indices_tensor_X,[1,S,S]) # 1 x S x S
#indices_tensor=tf.repeat(indices_tensor,repeats=[batch_tensor],axis=0) # batch x S x S
# repeat() will repeat axis-0 (SxS), batch_tensor number of times along the channel

# IOU Calculation between two bounding boxes
def return_iou_tensor(box_true,box_pred,i):
  '''
  box_true=batch x S x S x 8
  box_pred=batch x S x S x (5B+C)
  '''
  
  # Restored gt
  cx_restored_gt_tensor=norm_const*(indices_tensor_X+box_true[:,:,:,2]) # 1 x S x S + batch x S x S = batch x S x S
  cy_restored_gt_tensor=norm_const*(indices_tensor_Y+box_true[:,:,:,3]) # 1 x S x S + batch x S x S = batch x S x S
  h_restored_gt_tensor=box_true[:,:,:,4]*I_S # batch x S x S
  w_restored_gt_tensor=box_true[:,:,:,5]*I_S # batch x S x S

  # Restored predicted
  cx_restored_pred_tensor=norm_const*(indices_tensor_X+box_pred[:,:,:,B+4*i]) # 1 x S x S + batch x S x S = batch x S x S
  cx_restored_pred_tensor=tf.math.maximum(cx_restored_pred_tensor,0)# To remove negative values
  cy_restored_pred_tensor=norm_const*(indices_tensor_Y+box_pred[:,:,:,B+1+4*i]) # 1 x S x S + batch x S x S = batch x S x S
  cy_restored_pred_tensor=tf.math.maximum(cy_restored_pred_tensor,0)# To remove negative values
  h_restored_pred_tensor=box_pred[:,:,:,B+2+4*i]*I_S # batch x S x S
  h_restored_pred_tensor=tf.math.maximum(h_restored_pred_tensor,0)# To remove negative values
  w_restored_pred_tensor=box_pred[:,:,:,B+3+4*i]*I_S # batch x S x S
  w_restored_pred_tensor=tf.math.maximum(w_restored_pred_tensor,0)# To remove negative values

  # min max of intersection box all, batch x S x S
  x_min_tensor=tf.math.maximum(tf.math.maximum(cx_restored_gt_tensor-w_restored_gt_tensor/2,0),tf.math.maximum(cx_restored_pred_tensor-w_restored_pred_tensor/2,0))
  y_min_tensor=tf.math.maximum(tf.math.maximum(cy_restored_gt_tensor-h_restored_gt_tensor/2,0),tf.math.maximum(cy_restored_pred_tensor-h_restored_pred_tensor/2,0))
  x_max_tensor=tf.math.minimum(cx_restored_gt_tensor+w_restored_gt_tensor/2,cx_restored_pred_tensor+w_restored_pred_tensor/2)
  y_max_tensor=tf.math.minimum(cy_restored_gt_tensor+h_restored_gt_tensor/2,cy_restored_pred_tensor+h_restored_pred_tensor/2)
  w_intersection=tf.math.maximum(x_max_tensor-x_min_tensor,0)
  h_intersection=tf.math.maximum(y_max_tensor-y_min_tensor,0)
  intersection_tensor=w_intersection*h_intersection # batch x S x S
  union_tensor=(w_restored_gt_tensor*h_restored_gt_tensor)+(w_restored_pred_tensor*h_restored_pred_tensor) # batch x S x S
  smooth=1 # We are using smooth because we dont want division by 0
  return (intersection_tensor+smooth)/(union_tensor+smooth) #batch x S x S

以及在训练期间要观察的 mean_iou_metric：

def mean_iou_metric(y_true,y_pred):
  mean_iou=0.0
  for i in range(0,B):
    iou_tensor=y_true[:,:,:,0]*return_iou_tensor(y_true,y_pred,i)
    mean_iou=mean_iou+K.mean(iou_tensor)
  return mean_iou/B

模型（基于faster yolo）：

TLDR；图片：

from keras import backend as K
import tensorflow as tf

def custom_activation(x):
  # LEAKY RELU
  isPositive=K.cast(K.greater(x,0),K.floatx()) # U HAVE TO CAST THE OUTPUT OF COMAPARISION TO FLOAT, BOOL NOT ACCEPTED
  # OUTPUT OF THIS FUNCTION IS A TENSOR
  return (isPositive*x)+(1-isPositive)*0.1*x

###############  BLOCK 1 ##############################
input_=Input(shape=(256,256,3),name='input')
#zeropad1=ZeroPadding2D(padding=(3,3))(input_) # PADDING MAKES 448->3+448+3, it is required to bring output to 112

convLayer1=Conv2D(64,(7,7),strides=(2,2),padding='valid',activation=custom_activation,name='conv_layer1')(input_)
maxpoolLayer1=MaxPooling2D(pool_size=(2,2),name='max_pool_layer1')(convLayer1)
#zeropad2=ZeroPadding2D(padding=(1,1))(maxpoolLayer1)
########################################################

###############  BLOCK 2 ##############################
convLayer2=Conv2D(192,(3,3),padding='valid',activation=custom_activation,name='conv_layer2')(maxpoolLayer1)
maxpoolLayer2=MaxPooling2D(pool_size=(2,2),name='max_pool_layer2')(convLayer2)
#zeropad3=ZeroPadding2D(padding=(2,2))(maxpoolLayer2)
########################################################

###############  BLOCK 3 ##############################
convLayer3=Conv2D(128,(1,1),padding='valid',activation=custom_activation,name='conv_layer3')(maxpoolLayer2)
convLayer4=Conv2D(256,(3,3),padding='valid',activation=custom_activation,name='conv_layer4')(convLayer3)
#convLayer5=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer5')(convLayer4)
#convLayer6=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer6')(convLayer5)
maxpoolLayer3=MaxPooling2D(pool_size=(2,2),name='max_pool_layer3')(convLayer4)
#zeropad4=ZeroPadding2D(padding=(5,5))(maxpoolLayer3)
########################################################

###############  BLOCK 4 ##############################
convLayer7=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer7')(maxpoolLayer3)
convLayer8=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer8')(convLayer7)
#convLayer9=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer9')(convLayer8)
#convLayer10=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer10')(convLayer9)
#convLayer11=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer11')(convLayer10)
#convLayer12=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer12')(convLayer11)
#convLayer13=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer13')(convLayer12)
#convLayer14=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer14')(convLayer13)
#convLayer15=Conv2D(512,(1,1),padding='valid',activation=custom_activation,name='conv_layer15')(convLayer14)
#convLayer16=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer16')(convLayer15)
maxpoolLayer4=MaxPooling2D(pool_size=(2,2),name='max_pool_layer4')(convLayer8)
#zeropad5=ZeroPadding2D(padding=(4,4))(maxpoolLayer4)
###########################################################

###############  BLOCK 5 ##################################
convLayer17=Conv2D(512,(1,1),padding='valid',activation=custom_activation,name='conv_layer17')(maxpoolLayer4)
convLayer18=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer18')(convLayer17)
#convLayer19=Conv2D(512,(1,1),padding='valid',activation=custom_activation,name='conv_layer19')(convLayer18)
#convLayer20=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer20')(convLayer19)
#convLayer21=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer21')(convLayer20)
#convLayer22=Conv2D(1024,(3,3),strides=(2,2),padding='valid',activation=custom_activation,name='conv_layer22')(convLayer21)
#zeropad6=ZeroPadding2D(padding=(2,2))(convLayer18)
#############################################################

################ BLOCK 6 ####################################
convLayer23=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer23')(convLayer18)
#convLayer24=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer24')(convLayer23)
flattenedLayer1=Flatten()(convLayer23) # Flatten just converts 3d matrix to 1d so that it can be connected to a next Dense Layer
###############################################################

################ BLOCK 7 #########################################
denseLayer1=Dense(units=4096,activation=custom_activation)(flattenedLayer1)
##################################################################

################ BLOCK 8 #########################################
denseLayer2=Dense(units=S*S*(5*B+C),activation='linear')(denseLayer1)
output_=Reshape((S,S,(5*B+C)))(denseLayer2) # Reshapes the 1D to 3D
##################################################################

fast_model=Model(inputs=input_,outputs=output_)
fast_model.summary()
from keras.utils import plot_model
plot_model(fast_model,to_file='unet.png',show_shapes=True)

最新训练参数：

model=fast_model # SELECT MODEL
model.save_weights('weights.hdf5')
model.compile(optimizer=Adam(learning_rate=1e-5),loss=yolo_loss_trial,metrics=[mean_iou_metric])
#model.compile(optimizer=SGD(learning_rate=1e-5),loss=yolo_loss_trial,metrics=[mean_iou_metric])
model.load_weights('weights.hdf5')
checkpointer = callbacks.ModelCheckpoint(filepath = 'weights.hdf5',save_best_only=True)
training_log = callbacks.TensorBoard(log_dir='./Model_logs')
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.2,patience=3, min_lr=1e-5,mode='auto') # ADD IN CALLBACK in fit()
# patience is after how many epochs if improvement is not seen, then reduce lr, newlr=lr*factor

for i in range(0,10):
  print("Iteration, ",i)
  history=model.fit(X_train,Y_train,validation_data=(X_val,Y_val),batch_size=16,epochs=32,callbacks=[training_log,checkpointer,reduce_lr],shuffle=True)
  # SAVE MODEL TO DRIVE
  !cp '/content/weights.hdf5' 'gdrive/My Drive/Colab Notebooks/Colab Datasets/Breast_Cancer_HNS/Images'
  # CONFIRM EXECUTION TIMESTAMP
  from datetime import datetime
  import pytz
  tz = pytz.timezone('Asia/Calcutta')
  berlin_now = datetime.now(tz)
  dt_string = berlin_now.strftime("%d/%m/%Y %H:%M:%S")
  print(dt_string)
  #from google.colab import output
  #output.eval_js('new Audio("https://ssl.gstatic.com/dictionary/static/sounds/20180430/complete--_us_1.mp3").play()')
# SAVE WHOLE MODEL TO LOCAL/COLAB DRIVE
model.save("FastYolo") #Saves weights also according to official docs
# SAVE MODEL TO GOOGLE DRIVE
!cp '/content/FastYolo' '/content/gdrive/My Drive/Colab Notebooks/Colab Datasets/Shape_Detection_YOLO/None2500NoOverlap'

python - 为什么 YOLO 训练损失没有显着减少且平均 IoU 没有增加？

0 回答 0

Related

Reference