python - Food101 SqueezeNet Caffe2 迭代次数

Question

我正在尝试使用 Caffe2 中的挤压网对 ETH Food-101 数据集进行分类。我的模型是从 Model Zoo 导入的，我对模型进行了两种类型的修改：

1) 将最后一层的尺寸更改为 101 个输出

2) 数据库中的图像采用 NHWC 形式，我只是翻转了权重的尺寸以匹配。（我打算改变这个）

Food101 数据集有 75,000 张图像用于训练，我目前使用的批量大小为 128，起始学习率为 -0.01，gamma 为 0.999，步长为 1。我注意到，对于网络的前 2000 次迭代，精度徘徊在 1/128 左右，这需要一个小时左右才能完成。

我将所有权重添加到 model.params 以便它们可以在梯度下降期间更新（数据除外）并将所有权重重新初始化为 Xavier 并将偏差设为常数。我希望在前 100 到 1000 次迭代中准确度会快速增长，然后随着迭代次数的增加而下降。就我而言，学习保持在 0 左右。

当我查看梯度文件时，我发现平均值约为 10^-6，标准偏差为 10^-7。这解释了缓慢的学习率，但我无法让梯度开始更高。

这些是几次迭代后第一次卷积的梯度统计

    Min        Max          Avg       Sdev
-1.69821e-05 2.10922e-05 1.52149e-06 5.7707e-06
-1.60263e-05 2.01478e-05 1.49323e-06 5.41754e-06
-1.62501e-05 1.97764e-05 1.49046e-06 5.2904e-06
-1.64293e-05 1.90508e-05 1.45681e-06 5.22742e-06

以下是我的代码的核心部分：

#init_path is path to init_net protobuf 
#pred_path is path to pred_net protobuf
def main(init_path, pred_path):
    ws.ResetWorkspace()
    data_folder = '/home/myhome/food101/'
    #some debug code here
    arg_scope = {"order":"NCHW"}
    train_model = model_helper.ModelHelper(name="food101_train", arg_scope=arg_scope)
    if not debug:
            data, label = AddInput(
                    train_model, batch_size=128,
                    db=os.path.join(data_folder, 'food101-train-nchw-leveldb'),
                    db_type='leveldb')
    init_net_def, pred_net_def = update_squeeze_net(init_path, pred_path)
    #print str(init_net_def)
    train_model.param_init_net.AppendNet(core.Net(init_net_def))
    train_model.net.AppendNet(core.Net(pred_net_def))
    ws.RunNetOnce(train_model.param_init_net)
    add_params(train_model, init_net_def)
    AddTrainingOperators(train_model, 'softmaxout', 'label')
    AddBookkeepingOperators(train_model)

    ws.RunNetOnce(train_model.param_init_net)
    if debug:
            ws.FeedBlob('data', data)
            ws.FeedBlob('label', label)
    ws.CreateNet(train_model.net)

    total_iters = 10000
    accuracy = np.zeros(total_iters)
    loss = np.zeros(total_iters)
    # Now, we will manually run the network for 200 iterations.
    for i in range(total_iters):
            #try:
            conv1_w = ws.FetchBlob('conv1_w')
            print conv1_w[0][0]
            ws.RunNet("food101_train")
            #except RuntimeError:
            #       print ws.FetchBlob('conv1').shape
            #       print ws.FetchBlob('pool1').shape
            #       print ws.FetchBlob('fire2/squeeze1x1_w').shape
            #       print ws.FetchBlob('fire2/squeeze1x1_b').shape
            #softmax = ws.FetchBlob('softmaxout')
            #print softmax[i]
            #print softmax[i][0][0]
            #print softmax[i][0][:5]
            #print softmax[64*i]
            accuracy[i] = ws.FetchBlob('accuracy')
            loss[i] = ws.FetchBlob('loss')
            print accuracy[i], loss[i]

我的 add_params 函数初始化权重如下

#ops allows me to only initialize the weights of specific ops because i initially was going to do last layer training
def add_params(model, init_net_def, ops=[]):
    def add_param(op):
            for output in op.output:
                    if "_w" in output:
                            weight_shape = []
                            for arg in op.arg:
                                    if arg.name == 'shape':
                                            weight_shape = arg.ints
                            weight_initializer = initializers.update_initializer(
                                                    None,
                                                    None,
                                                    ("XavierFill", {}))
                            model.create_param(
                                    param_name=output,
                                    shape=weight_shape,
                                    initializer=weight_initializer,
                                    tags=ParameterTags.WEIGHT)
                    elif "_b" in output:
                            weight_shape = []
                            for arg in op.arg:
                                    if arg.name == 'shape':
                                            weight_shape = arg.ints
                            weight_initializer = initializers.update_initializer(
                                                    None,
                                                    None,
                                                    ("ConstantFill", {}))
                            model.create_param(
                                    param_name=output,
                                    shape=weight_shape,
                                    initializer=weight_initializer,

我发现当我使用完整的训练集时，我的损失函数会波动，但如果我只使用一个批次并对其进行多次迭代，我会发现损失函数下降但非常缓慢。

score 1 · Accepted Answer

虽然 SqueezeNet 的参数比 AlexNet 少 50 倍，但它仍然是一个非常大的网络。原始论文没有提到训练时间，但基于 SqueezeNet 的SQ需要 22 小时才能使用两个 Titan X 显卡进行训练——而且是预先训练了一些权重！我没有详细查看您的代码，但您描述的是预期行为 - 您的网络能够在单个批次上学习，只是没有您预期的那么快。

我建议尽可能多地重用权重，而不是像 SQ 的创建者那样重新初始化它们。这被称为迁移学习，它之所以有效，是因为无论图像的内容如何，图像中的许多较低级别的特征（线、曲线、基本形状）都是相同的，并且重用这些层的权重可以避免网络从头开始重新学习它们。

python - Food101 SqueezeNet Caffe2 迭代次数

1 回答 1

Related

Reference