python - 当 `colsample_bytree` 不是 1 时，xgb.cv 的 auc 分数与 cross_val_score 不匹配

Question

我正在研究高度不平衡的数据集。在超参数调优过程中，我发现如果colssample_bytree设置为 1 以外的值，那么cross_val_score从 sklearn 包中获取的 auc 分数与从xgb.cv.

xgb.cv 代码：

# creating kfolds
kfolds = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 16)

# creating model object and using it for xgb.cv
xgb0 = XGBClassifier(objective= 'binary:logistic', n_estimators =2, colsample_bytree = 0.6, 
random_state =16, n_jobs = -1, eval_metric = 'auc')
params = xgb0.get_params()
xg_train = xgb.DMatrix(X_train_p.values, label = y_train.values)
cv_result = xgb.cv(params, xg_train, num_boost_round=2, folds = kfolds, metrics = 'auc', early_stopping_rounds = 50, 
                   as_pandas = True, seed = 16,stratified=True, shuffle = True)
print(cv_result['test-auc-mean'].values[-1])

这导致 test-auc-mean 为 0.91706

cross_val_score 代码：

cv_score = cross_val_score(xgb0, X_train_p, y_train, cv = kfolds, n_jobs = -1, scoring = 'roc_auc')
cv_score.mean()

这导致 test-auc-mean 为 0.8994

我不明白这些之间的巨大差异，正如我已经指出的那样，如果colsample_bytree设置为 1，那么分数之间没有差异。随着我们减少，分数之间的差异auc也会显着增加colsample_bytree。

有人可以帮助我理解为什么会这样。谢谢

python - 当 `colsample_bytree` 不是 1 时，xgb.cv 的 auc 分数与 cross_val_score 不匹配

0 回答 0

Related

Reference