我正在尝试实现 SGD 功能以在 caffe python 中手动更新 python 中的权重,而不是使用solver.step()
函数。solver.step()
目标是通过手动更新权重来匹配完成后的权重更新。
设置如下: 使用 MNIST 数据。将solver.prototxt中的随机种子设置为random_seed: 52
:确保momentum: 0.0
和base_lr: 0.01
,,lr_policy: "fixed"
。上面已经完成,我可以简单地实现 SGD 更新方程(没有动量、正则化等)。公式很简单:W_t+1 = W_t - mu * W_t_diff
以下是两个测试:
Test1: 使用caffe的forward()和backward()计算前向传播和后向传播。对于包含权重的每一层,我都会这样做:
for k in weight_layer_idx:
solver.net.layers[k].blobs[0].diff[...] *= lr # weights
solver.net.layers[k].blobs[1].diff[...] *= lr # biases
接下来,将权重/偏差更新为:
solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff
我运行了 5 次迭代。
Test2:运行 caffe 的solver.step(5)
.
现在,我期望这两个测试在两次迭代后应该产生完全相同的权重。
我在上述每个测试之后保存权重值,并通过两个测试计算权重向量之间的范数差,我发现它们并不精确。有人能发现我可能遗漏的东西吗?
以下是完整的代码供参考:
import caffe
caffe.set_device(0)
caffe.set_mode_gpu()
import numpy as np
niter = 5
solver = None
solver = caffe.SGDSolver('solver.prototxt')
# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_solver_step = copy(solver.net.layers[1].blobs[1].data.astype('float64'))
# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
lr = 0.01
momentum = 0.
# Get layer types
layer_types = []
for ll in solver.net.layers:
layer_types.append(ll.type)
# Get the indices of layers that have weights in them
weight_layer_idx = [idx for idx,l in enumerate(layer_types) if 'Convolution' in l or 'InnerProduct' in l]
for it in range(1, niter+1):
solver.net.forward() # fprop
solver.net.backward() # bprop
for k in weight_layer_idx:
solver.net.layers[k].blobs[0].diff[...] *= lr
solver.net.layers[k].blobs[1].diff[...] *= lr
solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff
# save the weights to compare later
w_fwdbwd_update = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_fwdbwd_update = copy(solver.net.layers[1].blobs[1].data.astype('float64'))
# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)
将权重与两个测试进行比较的最后一行产生:
after iter 5 : weight diff: 0.000203027766144 and bias diff: 1.78390789051e-05
正如我所期望的那样,这个差异是 0.0
有任何想法吗?