我有一个关于在随机 GD 期间更新 theta 的问题。我有两种更新 theta 的方法:
1)使用前面的theta,得到所有样本的所有假设,然后通过每个样本更新theta。喜欢:
hypothese = np.dot(X, theta)
for i in range(0, m):
theta = theta + alpha * (y[i] - hypothese[i]) * X[i]
2)另一种方式:在扫描样本期间,使用最新的theta更新hypothese[i]。喜欢:
for i in range(0, m):
h = np.dot(X[i], theta)
theta = theta + alpha * (y[i] - h) * X[i]
我检查了SGD代码,似乎第二种方式是正确的。但是在我的编码过程中,第一个会收敛得更快,结果也比第二个好。为什么错误的方式比正确的方式表现得更好?
我还附上了完整的代码如下:
def SGD_method1():
maxIter = 100 # max iterations
alpha = 1e4 # learning rate
m, n = np.shape(X) # X[m,n], m:#samples, n:#features
theta = np.zeros(n) # initial theta
for iter in range(0, maxIter):
hypothese = np.dot(X, theta) # update all the hypoes using the same theta
for i in range(0, m):
theta = theta + alpha * (y[i] - hypothese[i]) * X[i]
return theta
def SGD_method2():
maxIter = 100 # max iterations
alpha = 1e4 # learning rate
m, n = np.shape(X) # X[m,n], m:#samples, n:#features
theta = np.zeros(n) # initial theta
for iter in range(0, maxIter):
for i in range(0, m):
h = np.dot(X[i], theta) # update on hypo using the latest theta
theta = theta + alpha * (y[i] -h) * X[i]
return theta