python - Python中浮点错误的无效文字

Question

我正在尝试使用 sklearn 并使用 sklearn 库在 Python 中执行线性回归。

这是我用来训练和拟合模型的代码，当我运行预测函数调用时出现错误。

train, test = train_test_split(h1, test_size = 0.5, random_state=0)

my_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode']
trainInp = train[my_features]

target = ['price']
trainOut = train[target]

regr = LinearRegression()

# Train the model using the training sets

regr.fit(trainInp, trainOut)

print('Coefficients: \n', regr.coef_)

testPred = regr.predict(test)

拟合模型后，当我尝试使用测试数据进行预测时，会引发以下错误

Traceback (most recent call last):
  File "C:/Users/gouta/PycharmProjects/MLCourse1/Python.py", line 52, in <module>
    testPred = regr.predict(test)
  File "C:\Users\gouta\Anaconda2\lib\site-packages\sklearn\linear_model\base.py", line 200, in predict
    return self._decision_function(X)
  File "C:\Users\gouta\Anaconda2\lib\site-packages\sklearn\linear_model\base.py", line 183, in _decision_function
    X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
  File "C:\Users\gouta\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 393, in check_array
    array = array.astype(np.float64)
ValueError: invalid literal for float(): 20140604T000000

线性回归模型的系数是

('Coefficients: \n', array([[ -5.04902429e+04,   5.23550164e+04,   2.90631319e+02,
         -1.19010351e-01,  -1.25257545e+04,   6.52414059e+02]]))

以下是测试数据集的前五行

是因为系数值大导致的错误吗？如何解决这个问题？

score 3 · Accepted Answer

您的问题是您正在将模型拟合到整个数据帧中的一组选定特征（您这样做trainInp = train[my_features]），但您试图预测完整的特征集（regr.predict(test)），包括非数字特征，如date.

所以与其做regr.predict(test)，不如做regr.predict(test[my_features])。更一般地，请记住，无论您对训练集应用什么预处理（归一化、特征选择、PCA ......），您也应该应用到测试集。

或者，您可以在进行训练测试拆分之前缩减到感兴趣的特征集：

my_features = ['bedrooms', 'bathrooms', ...]
train, test = train_test_split(h1[my_features], test_size = 0.5, random_state=0)

python - Python中浮点错误的无效文字

1 回答 1

Related

Reference