我的算法是这样的:
数据存储为:
data = [record1, record2, ... ]
where record1 is [1, x1, x2 ..., x_m] m feature values for that record
theta is parameter of linear regression function, theta is vector of size m+1
y is true label, again array of length, len(data). (y[0] is true value for record 0)
线性回归随机更新:
while True:
for i in range(len(data)):
x = data[i]
for t in range(0, m):
theta[t] = theta[t] - my_lambda * (np.dot(theta, x) - y[i]) * x[t]
j_theta = compute_J_of_theta(data, y, theta)
print "Iteration #: ", iterations, " j_theta ", j_theta
if j_theta < 5000:
#print "******************** FINALLY CONVERGED!!!! ********************"
break
compute_j_of_theta(data, y, theta):
"""
Convergence criteria,
compute J(theta) = 1/2M sum (h_theta(x_t) - y_t)**2
"""
temp = 0
for i in range(0, len(data)):
x = data[i]
temp += (np.dot(theta, x) - y[i])**2
return temp/2*M
my_lambda 非常小 最初 theta 是 0 大小为 m+1 的向量
Que:训练集错误不仅仅是测试......为什么?这有什么问题?编辑1:这是我计算错误的愚蠢错误