0

我正在尝试对黑色星期五数据集执行线性回归。当我进入模型训练部分时,我尝试拆分定义 X 和 y 值的数据集,然后执行训练测试拆分。

然后我使用线性回归训练我的模型。之后,我尝试绘制一个散点图,但出现 ValueError 错误:x 和 y 的大小必须相同。

注意:我已经导入了数据集“df”。

# Importing the necessary modules.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Creating the varibales X and y.

X= df.drop('Purchase', axis=1).values
y= df['Purchase'].values


# Splitting the dataframe to create a training and testing data set.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# This creates a LinearRegression object
lm = LinearRegression()

# Fit a linear model, calculate the root mean squared error and the R2 score.
lm.fit(X_train, y_train)

y_pred_linear = lm.predict(X_test)
y_train_predict  = lm.predict(X_train)

rmse_train = np.sqrt(mean_squared_error(y_train,y_train_predict))
r2_train = r2_score(y_train,y_train_predict)

rmse = np.sqrt(mean_squared_error(y_test,y_pred_linear))
r2 = r2_score(y_test,y_pred_linear)

print('Root mean squared error on Training Set', rmse_train)
print('R2 score on Training Set: ', r2_train)

print('Root mean squared error on Test Set', rmse)
print('R2 score on Testing Set: ', r2)

plt.scatter(X_train, y_train, s=10)

当我做

X.shape 我得到的结果是 (537577, 83)。但是当我执行 y.shape 时,我得到的结果是(537577,)。

此外,当散点图出现值错误时。基本上我想绘制预测结果与实际结果的散点图。

4

1 回答 1

0

您瞄准的情节可能不是很有用。本质上,您在 y 轴上有 83 个不同的变量,如果这就是您想要的,这应该可以解决问题。

import matplotlib.pyplot as plt
number_of_data_to_plot = 500
random_sample = np.random.randint(0,X_train.shape[0],number_of_data_to_plot)

for i in range(X_train.shape[1]):
  plt.scatter(X_train[random_sample,i],y_train[random_sample])
于 2019-07-26T18:55:44.587 回答