我刚开始使用 python 进行机器学习,并且正在研究多元线性回归。我在哪里了解虚拟变量陷阱,可以通过反向消除来解决,但是在应用反向消除时,我遇到了这个错误。(PatsyError:模型缺少必需的结果变量)
这些是我导入的文件
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import LabelEncoder , OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import statsmodels.formula.api as sm
这些是我的数据集的前 5 行:
gender age exercise hours grade
0 female 17 3 10 82.4
1 male 18 4 4 78.2
2 male 18 5 9 79.3
3 female 14 2 7 83.2
4 female 18 4 15 87.4
real_x = data_frame.iloc[:,:4].values
real_y = data_frame.iloc[:,4:].values
label_encoder_obj = LabelEncoder()
real_x[:,0] = label_encoder_obj.fit_transform(real_x[:,0])
one_hot_encoder = OneHotEncoder(categorical_features=[2])
real_x = one_hot_encoder.fit_transform(real_x).toarray()
real_x = real_x[:,1:]
training_x,test_x,training_y,test_y=
train_test_split(real_x,real_y,test_size=0.2,random_state=0)
multiple_linear_regression = LinearRegression()
multiple_linear_regression.fit(training_x,training_y)
predection_y = multiple_linear_regression.predict(test_x)
real_x=np.append(arr=np.ones((real_x.shape[0],1)).astype(int),
values=real_x,axis=1)
x_optimization = real_x[:,[0,1,2,3,4,5]]
在下面的行中,我遇到了错误。
regresion_ordinary_least_squar = sm.ols(real_y,data=x_optimization).fit();
# if missing == 'raise' 没有missing_mask
PatsyError: model is missing required outcome variables
我看过一些在线示例,其中一些代码
sm.OLS()
被用来代替
sm.ols()
有什么区别?