我正在尝试在 XGBoost 回归问题上使用 kFold。数据样本如下:
当我使用以下代码时:
df = pd.read_csv('../data/df_samp.csv').head(1000)
cat_columns = ['primary_use','meter','hour','weekday','month','wind_compass']
df_processed = pd.get_dummies(df, prefix_sep="_", columns=cat_columns)
X=df_processed.drop(['meter_reading','outlier_ratio','meter_reading_roll_avg','timestamp'],axis=1)
y=df_processed['meter_reading']
scores = []
model = XGBClassifier()
cv = KFold(n_splits=10, shuffle=False)
for train_index, test_index in cv.split(X):
print("Train Index: ", train_index, "\n")
print("Test Index: ", test_index)
X_train, X_test, y_train, y_test = X.values[train_index], X.values[test_index], y.values[train_index], y.values[test_index]
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
predictions = [round(value) for value in y_pred]
scores.append(r2_score(y_test,predictions))
我得到输出
print(scores)
[0.406908684278529, 0.3320925821156784, 0.1039843686445262, 0.395466094618815, 0.13412072574647682, -0.015579242639622182, -0.17008382837529967, 0.3931056789610018, 0.4491969042604125, 0.49641651402527265]
当我尝试
scores = []
model = XGBClassifier()
cv = KFold(n_splits=10, random_state=42, shuffle=False)
cross_val_score(model, X.values, y.values, cv=10)
我明白了
ValueError: continuous is not supported
有人知道为什么吗?
谢谢