python - Caffe HDF5 不学习

Question

我正在使用 Caffe 将 GoogleNet 网络微调到我自己的数据集。如果我使用 IMAGE_DATA 层作为输入学习发生。但是，我需要切换到 HDF5 层以进行所需的进一步扩展。当我使用 HDF5 层时，不会发生学习。

我使用完全相同的输入图像，并且标签也匹配。我还检查以确保 .h5 文件中的数据可以正确加载。确实如此，而且 Caffe 还能够找到我提供给它的示例数量以及正确的类数量 (2)。

这让我认为问题在于我手动执行的转换（因为 HDF5 层不执行任何内置转换）。这些代码如下。我执行以下操作：

将图像从 RGB 转换为 BGR
将其大小调整为 256x256，以便我可以从 ImageNet 中减去平均文件（包含在 Caffe 库中）
由于原始的 GoogleNet prototxt 没有除以 255，我也没有（见这里）
我将图像大小调整为 224x224，这是 GoogleNet 所需的裁剪大小
根据 Caffe 的要求，我根据需要转置图像以满足 CxHxW
目前我没有执行数据增强，如果我让 oversample=True 可以打开它。

有人能看出这种方法有什么问题吗？数据增强是否如此重要以至于没有它就无法进行学习？

HDF5转换代码

IMG_RESHAPE = 224
IMG_UNCROPPED = 256

def resize_convert(img_names, path=None, oversample=False):
    '''
    Load images, set to BGR mode and transpose to CxHxW
    and subtract the Imagenet mean. If oversample is True, 
    perform data augmentation.

    Parameters:
    ---------
    img_names (list): list of image names to be processed.
    path (string): path to images.
    oversample (bool): if True then data augmentation is performed
        on each image, and 10 crops of size 224x224 are produced 
        from each image. If False, then a single 224x224 is produced.
    '''

    path = path if path is not None else ''
    if oversample == False:
        all_imgs = np.empty((len(img_names), 3, IMG_RESHAPE, IMG_RESHAPE), dtype='float32')
    else:
        all_imgs = np.empty((len(img_names), 3, IMG_UNCROPPED, IMG_UNCROPPED), dtype='float32')

    #load the imagenet mean
    mean_val = np.load('/path/to/imagenet/ilsvrc_2012_mean.npy')

    for i, img_name in enumerate(img_names):
        img = ndimage.imread(path+img_name, mode='RGB') # Read as HxWxC

        #subtract the mean of Imagenet
        #First, resize to 256 so we can subtract the mean of dims 256x256 
        img = img[...,::-1] #Convert RGB TO BGR
        img = caffe.io.resize_image(img, (IMG_UNCROPPED, IMG_UNCROPPED), interp_order=1)
        img = np.transpose(img, (2, 0, 1))  #HxWxC => CxHxW 
        #Since mean is given in Caffe channel order: 3xWxH
        #Assume it also is given in BGR order
        img = img - mean_val

        #set to 0-1 range => I don't think googleNet requires this
        #I tried both and it didn't make a difference
        #img = img/255

        #resize images down since GoogleNet accepts 224x224 crops
        if oversample == False:
            img = np.transpose(img, (1,2,0))  # CxHxW => HxWxC 
            img = caffe.io.resize_image(img, (IMG_RESHAPE, IMG_RESHAPE), interp_order=1)
            img = np.transpose(img, (2,0,1)) #convert to CxHxW for Caffe 
        all_imgs[i, :, :, :] = img

    #oversampling requires HxWxC order
    if oversample:
        all_imgs = np.transpose(all_imgs, (0, 3, 1, 2))
        all_imgs = caffe.io.oversample(all_imgs, (IMG_RESHAPE, IMG_RESHAPE))
        all_imgs = np.transpose(all_imgs, (0,2,3,1)) #convert to CxHxW for Caffe 

    return all_imgs

IMAGE_DATA 和 HDF5 prototxt 文件的相关区别

name: "GoogleNet"
layers {
  name: "data"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "/path/to/train_list.txt"
    batch_size: 32
  }
  include: { phase: TRAIN }
}
layers {
  name: "data"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "/path/to/valid_list.txt"
    batch_size:10
  }
  include: { phase: TEST }
}

更新

当我说没有学习发生时，我的意思是与 IMG_Data 相比，使用 HDF5 数据时我的训练损失并没有持续下降。在下图中，第一个图是 IMG_DATA 网络的训练损失变化，另一个是 HDF5 数据网络。

我正在考虑的一种可能性是网络过度拟合了我提供给它的每个 .h5。目前我正在使用数据增强，但所有增强的示例都与其他示例一起存储到一个 .h5 文件中。但是，由于单个输入图像的所有增强版本都包含在同一个 .h5 文件中，我认为这可能会导致网络过度适应该特定的 .h5 文件。但是，我不确定这是否是第二个情节所暗示的。

score 0 · Accepted Answer

我遇到了同样的问题，发现由于某种原因，在代码中手动进行转换会导致图像全黑（全为零）。尝试调试您的代码，看看是否发生这种情况。解决方案是使用此处的 Caffe 教程中解释的相同方法 http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb 您看到的部分

# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR

然后几行下来

image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)

python - Caffe HDF5 不学习

HDF5转换代码

IMAGE_DATA 和 HDF5 prototxt 文件的相关区别

更新

1 回答 1

Related

Reference