我正在尝试使用 Caffe2 中的挤压网对 ETH Food-101 数据集进行分类。我的模型是从 Model Zoo 导入的,我对模型进行了两种类型的修改:
1) 将最后一层的尺寸更改为 101 个输出
2) 数据库中的图像采用 NHWC 形式,我只是翻转了权重的尺寸以匹配。(我打算改变这个)
Food101 数据集有 75,000 张图像用于训练,我目前使用的批量大小为 128,起始学习率为 -0.01,gamma 为 0.999,步长为 1。我注意到,对于网络的前 2000 次迭代,精度徘徊在 1/128 左右,这需要一个小时左右才能完成。
我将所有权重添加到 model.params 以便它们可以在梯度下降期间更新(数据除外)并将所有权重重新初始化为 Xavier 并将偏差设为常数。我希望在前 100 到 1000 次迭代中准确度会快速增长,然后随着迭代次数的增加而下降。就我而言,学习保持在 0 左右。
当我查看梯度文件时,我发现平均值约为 10^-6,标准偏差为 10^-7。这解释了缓慢的学习率,但我无法让梯度开始更高。
这些是几次迭代后第一次卷积的梯度统计
Min Max Avg Sdev
-1.69821e-05 2.10922e-05 1.52149e-06 5.7707e-06
-1.60263e-05 2.01478e-05 1.49323e-06 5.41754e-06
-1.62501e-05 1.97764e-05 1.49046e-06 5.2904e-06
-1.64293e-05 1.90508e-05 1.45681e-06 5.22742e-06
以下是我的代码的核心部分:
#init_path is path to init_net protobuf
#pred_path is path to pred_net protobuf
def main(init_path, pred_path):
ws.ResetWorkspace()
data_folder = '/home/myhome/food101/'
#some debug code here
arg_scope = {"order":"NCHW"}
train_model = model_helper.ModelHelper(name="food101_train", arg_scope=arg_scope)
if not debug:
data, label = AddInput(
train_model, batch_size=128,
db=os.path.join(data_folder, 'food101-train-nchw-leveldb'),
db_type='leveldb')
init_net_def, pred_net_def = update_squeeze_net(init_path, pred_path)
#print str(init_net_def)
train_model.param_init_net.AppendNet(core.Net(init_net_def))
train_model.net.AppendNet(core.Net(pred_net_def))
ws.RunNetOnce(train_model.param_init_net)
add_params(train_model, init_net_def)
AddTrainingOperators(train_model, 'softmaxout', 'label')
AddBookkeepingOperators(train_model)
ws.RunNetOnce(train_model.param_init_net)
if debug:
ws.FeedBlob('data', data)
ws.FeedBlob('label', label)
ws.CreateNet(train_model.net)
total_iters = 10000
accuracy = np.zeros(total_iters)
loss = np.zeros(total_iters)
# Now, we will manually run the network for 200 iterations.
for i in range(total_iters):
#try:
conv1_w = ws.FetchBlob('conv1_w')
print conv1_w[0][0]
ws.RunNet("food101_train")
#except RuntimeError:
# print ws.FetchBlob('conv1').shape
# print ws.FetchBlob('pool1').shape
# print ws.FetchBlob('fire2/squeeze1x1_w').shape
# print ws.FetchBlob('fire2/squeeze1x1_b').shape
#softmax = ws.FetchBlob('softmaxout')
#print softmax[i]
#print softmax[i][0][0]
#print softmax[i][0][:5]
#print softmax[64*i]
accuracy[i] = ws.FetchBlob('accuracy')
loss[i] = ws.FetchBlob('loss')
print accuracy[i], loss[i]
我的 add_params 函数初始化权重如下
#ops allows me to only initialize the weights of specific ops because i initially was going to do last layer training
def add_params(model, init_net_def, ops=[]):
def add_param(op):
for output in op.output:
if "_w" in output:
weight_shape = []
for arg in op.arg:
if arg.name == 'shape':
weight_shape = arg.ints
weight_initializer = initializers.update_initializer(
None,
None,
("XavierFill", {}))
model.create_param(
param_name=output,
shape=weight_shape,
initializer=weight_initializer,
tags=ParameterTags.WEIGHT)
elif "_b" in output:
weight_shape = []
for arg in op.arg:
if arg.name == 'shape':
weight_shape = arg.ints
weight_initializer = initializers.update_initializer(
None,
None,
("ConstantFill", {}))
model.create_param(
param_name=output,
shape=weight_shape,
initializer=weight_initializer,
我发现当我使用完整的训练集时,我的损失函数会波动,但如果我只使用一个批次并对其进行多次迭代,我会发现损失函数下降但非常缓慢。