在下面的示例中,它能够找到正确的斜率 (m),但在截距 (b) 上完全消失,它总是接近于零。除非我给 ba 1000 倍的学习率。
为什么会这样?不同类型的参数需要不同的学习率吗?
b 没有 1000 倍学习率的示例结果:
m=3.1509653303 b=0.0360896063255
b 学习率为 1000 倍的示例结果:
m=3.14160584013 b=6.27263311371
这是怎么回事?
N = 1000
data = [x * 3.14159 + 3.14159 * 2 for x in xrange(N)]
m_param = b_param = 0
learning_rate = .000001
b_learning_rate = learning_rate * 1000
last_total_error = float('inf')
for i in xrange(10000):
m_grad = 0
b_grad = 0
total_error = 0
for x, y in enumerate(data):
guess = m_param * x + b_param
err = y - guess
total_error += err ** 2
m_grad += -(2./N) * x * err
b_grad += -(2./N) * err
if last_total_error == total_error and i > 20:
break
last_total_error = total_error
m_param -= m_grad * learning_rate
b_param -= b_grad * b_learning_rate
print 'params', m_param, b_param