我在 Python 中实现了一个单变量线性回归模型,它使用梯度下降来找到最佳拟合线的截距和斜率(我使用梯度下降而不是直接计算截距和斜率的最佳值,因为我' d 最终喜欢推广到多元回归)。
我使用的数据如下。sales
是因变量(以美元为单位)并且temp
是自变量(摄氏度)(想想冰淇淋销售与温度,或类似的东西)。
sales temp
215 14.20
325 16.40
185 11.90
332 15.20
406 18.50
522 22.10
412 19.40
614 25.10
544 23.40
421 18.10
445 22.60
408 17.20
这是我标准化后的数据:
sales temp
0.06993007 0.174242424
0.326340326 0.340909091
0 0
0.342657343 0.25
0.515151515 0.5
0.785547786 0.772727273
0.529137529 0.568181818
1 1
0.836829837 0.871212121
0.55011655 0.46969697
0.606060606 0.810606061
0.51981352 0.401515152
我的算法代码:
import numpy as np
import pandas as pd
from scipy import stats
class SLRegression(object):
def __init__(self, learnrate = .01, tolerance = .000000001, max_iter = 10000):
# Initialize learnrate, tolerance, and max_iter.
self.learnrate = learnrate
self.tolerance = tolerance
self.max_iter = max_iter
# Define the gradient descent algorithm.
def fit(self, data):
# data : array-like, shape = [m_observations, 2_columns]
# Initialize local variables.
converged = False
m = data.shape[0]
# Track number of iterations.
self.iter_ = 0
# Initialize theta0 and theta1.
self.theta0_ = 0
self.theta1_ = 0
# Compute the cost function.
J = (1.0/(2.0*m)) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0])**2 for i in range(m)])
print('J is: ', J)
# Iterate over each point in data and update theta0 and theta1 on each pass.
while not converged:
diftemp0 = (1.0/m) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0]) for i in range(m)])
diftemp1 = (1.0/m) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0]) * data[i][1] for i in range(m)])
# Subtract the learnrate * partial derivative from theta0 and theta1.
temp0 = self.theta0_ - (self.learnrate * diftemp0)
temp1 = self.theta1_ - (self.learnrate * diftemp1)
# Update theta0 and theta1.
self.theta0_ = temp0
self.theta1_ = temp1
# Compute the updated cost function, given new theta0 and theta1.
new_J = (1.0/(2.0*m)) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0])**2 for i in range(m)])
print('New J is: %s') % (new_J)
# Test for convergence.
if abs(J - new_J) <= self.tolerance:
converged = True
print('Model converged after %s iterations!') % (self.iter_)
# Set old cost equal to new cost and update iter.
J = new_J
self.iter_ += 1
# Test whether we have hit max_iter.
if self.iter_ == self.max_iter:
converged = True
print('Maximum iterations have been reached!')
return self
def point_forecast(self, x):
# Given feature value x, returns the regression's predicted value for y.
return self.theta0_ + self.theta1_ * x
# Run the algorithm on a data set.
if __name__ == '__main__':
# Load in the .csv file.
data = np.squeeze(np.array(pd.read_csv('sales_normalized.csv')))
# Create a regression model with the default learning rate, tolerance, and maximum number of iterations.
slregression = SLRegression()
# Call the fit function and pass in the data.
slregression.fit(data)
# Print out the results.
print('After %s iterations, the model converged on Theta0 = %s and Theta1 = %s.') % (slregression.iter_, slregression.theta0_, slregression.theta1_)
# Compare our model to scipy linregress model.
slope, intercept, r_value, p_value, slope_std_error = stats.linregress(data[:,1], data[:,0])
print('Scipy linear regression gives intercept: %s and slope = %s.') % (intercept, slope)
# Test the model with a point forecast.
print('As an example, our algorithm gives y = %s given x = .87.') % (slregression.point_forecast(.87)) # Should be about .83.
print('The true y-value for x = .87 is about .8368.')
我无法准确理解是什么让算法收敛与完全错误的返回值。给定learnrate = .01
、tolerance = .0000000001
和max_iter = 10000
,结合归一化数据,我可以让梯度下降算法收敛。但是,当我使用未归一化的数据时,我可以在没有算法返回的情况下使学习率最小NaN
为.005
. 这使得成本函数从迭代到迭代的变化下降到大约614
,但我不能让它变得更低。
这种算法绝对需要标准化数据吗?如果是,为什么?x-value
此外,考虑到算法需要标准化值,将非标准化形式的小说插入点预测的最佳方法是什么?例如,如果我要将这个算法交付给客户,以便他们可以做出自己的预测(我不是,但为了争论..),我不希望他们能够简单地插入在未归一化x-value
?
总而言之,玩弄tolerance
, max_iter
, 并learnrate
在大多数情况下给我非收敛的结果。这是正常的,还是我的算法中存在导致此问题的缺陷?