scikit-learn - roc_auc_score, 'roc_auc', 'auc' 的奇怪行为

Question

在优化 xgboost 的参数时，我遇到了 roc_auc_score 指标的问题。与训练数据的结果相比，我在交叉验证期间得到了显着不同的结果。

class OptunaHyperparamsSearch:
def __init__(self, X_train, y_train, **kwargs):
    ...

def objective(self, trial):

    ...

    cv_results = xgb.cv(param, self.dtrain, num_boost_round=5, metrics=['auc'], nfold=5, verbose_eval=True)

    mean_auc = cv_results['test-auc-mean'].max()
    boost_rounds = cv_results['test-auc-mean'].idxmax()

    param['n_estimators'] = boost_rounds
    trial.set_user_attr('param', param)

    print('boost_rounds: ', boost_rounds)
    print('train-auc-mean', cv_results['train-auc-mean'][boost_rounds])

    return mean_auc

def best_model(self, n_trials=100, save_path=None):

    study = optuna.create_study(direction="maximize")
    study.optimize(self.objective, n_trials=n_trials)

    best_params = study.best_trial.user_attrs['param']
    best_model = xgb.XGBClassifier(**best_params)
    best_model.fit(self.X_train, self.y_train)

    return best_model

运行代码后：

search = OptunaHyperparamsSearch(X_train, y_train)
model = search.best_model(n_trials=1)

我收到了：

[0] train-auc:0.777869+0.00962852   test-auc:0.771169+0.025347
[1] train-auc:0.786905+0.00865646   test-auc:0.777492+0.0255523
[2] train-auc:0.793305+0.00480249   test-auc:0.785307+0.0198732
[3] train-auc:0.79595+0.00349561    test-auc:0.789897+0.0158569
[4] train-auc:0.796818+0.00407504   test-auc:0.789997+0.016069
boost_rounds:  4
train-auc-mean 0.796818
[I 2020-06-04 10:12:25,093] Finished trial#0 with value: 0.7899968 with parameters: {'booster': 'dart', 'reg_lambda': 0.8001057111479173, 'reg_alpha': 0.0016960618598770582, 'max_depth': 8, 'min_child_weight': 4, 'learning_rate': 0.0602235073221647, 'gamma': 0.0011248451567255984, 'colsample_bytree': 0.911487203002922, 'subsample': 0.9057485217255851, 'grow_policy': 'lossguide', 'scale_pos_weight': 0.5865962792358733, 'sample_type': 'weighted', 'normalize_type': 'tree', 'rate_drop': 0.0009459988874640169, 'skip_drop': 8.103200442539776e-05}. Best is trial#0 with value: 0.7899968.

所以结果大约是 0.8（train-auc-mean 0.796818）。之后运行：

y_pred = model.predict(X_train)
print(roc_auc_score(y_train, y_pred))

我收到了：

0.598231710442728

所以这是不可能的。我也尝试使用自定义功能：

from sklearn.metrics import roc_auc_score

def PyAUC(predt: np.ndarray, dtrain: xgb.DMatrix):
    y = dtrain.get_label()
    return 'PyAUC', roc_auc_score(y, predt)

并将它们传递feval给xgb.cv，设置param['disable_default_eval_metric'] = 1和不定义指标，结果是相同的。

然后我尝试使用 RandomizedSearchCV：

params = {
    'min_child_weight': [1, 5, 10],
    'gamma': [0.5, 1, 1.5, 2, 5],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0],
    'max_depth': [3, 4, 5]
    }
alg = XGBClassifier(learning_rate=0.01, n_estimators=5, objective='binary:logistic',
                silent=True, nthread=1)
skf = StratifiedKFold(n_splits=5, shuffle = True, random_state = 1001)

random_search = RandomizedSearchCV(alg, param_distributions=params, n_iter=10, scoring='roc_auc', n_jobs=4, cv=skf.split(X_train, y_train), verbose=3, random_state=1001 )

random_search.fit(X_train, y_train)

print('\n All results:')
print(random_search.cv_results_)

y_pred = random_search.predict(X_train)
print(roc_auc_score(y_train, y_pred))

输出是：

All results:
{'mean_fit_time': array([0.27621794, 0.40631523, 0.36202598, 0.32188687, 0.34574351,
   0.2747798 , 0.31780529, 0.32190156, 0.34060073, 0.25945067]), 'std_fit_time': array([0.02603387, 0.04572275, 0.09460844, 0.01841953, 0.08391794,
   0.03654419, 0.01583525, 0.03670047, 0.01035465, 0.03085039]), 'mean_score_time': array([0.01927972, 0.0143033 , 0.01697631, 0.01260743, 0.02442002,
   0.02089334, 0.0182806 , 0.0132216 , 0.01498265, 0.01320119]), 'std_score_time': array([0.00609847, 0.00671443, 0.00613005, 0.00410744, 0.00384849,
   0.00516041, 0.00505873, 0.00276774, 0.00023382, 0.00546102]), 'param_subsample': masked_array(data=[1.0, 0.6, 0.8, 1.0, 0.8, 1.0, 1.0, 0.8, 0.8, 0.8],
         mask=[False, False, False, False, False, False, False, False,
               False, False],
   fill_value='?',
        dtype=object), 'param_min_child_weight': masked_array(data=[5, 1, 5, 5, 1, 10, 1, 1, 1, 1],
         mask=[False, False, False, False, False, False, False, False,
               False, False],
   fill_value='?',
        dtype=object), 'param_max_depth': masked_array(data=[3, 5, 5, 5, 4, 4, 5, 3, 5, 4],
         mask=[False, False, False, False, False, False, False, False,
               False, False],
   fill_value='?',
        dtype=object), 'param_gamma': masked_array(data=[5, 1.5, 1, 5, 1, 1.5, 5, 2, 0.5, 1.5],
         mask=[False, False, False, False, False, False, False, False,
               False, False],
   fill_value='?',
        dtype=object), 'param_colsample_bytree': masked_array(data=[1.0, 0.8, 0.8, 0.6, 1.0, 0.6, 0.6, 0.8, 0.6, 0.6],
         mask=[False, False, False, False, False, False, False, False,
               False, False],
   fill_value='?',
        dtype=object), 'params': [{'subsample': 1.0, 'min_child_weight': 5, 'max_depth': 3, 'gamma': 5, 'colsample_bytree': 1.0}, {'subsample': 0.6, 'min_child_weight': 1, 'max_depth': 5, 'gamma': 1.5, 'colsample_bytree': 0.8}, {'subsample': 0.8, 'min_child_weight': 5, 'max_depth': 5, 'gamma': 1, 'colsample_bytree': 0.8}, {'subsample': 1.0, 'min_child_weight': 5, 'max_depth': 5, 'gamma': 5, 'colsample_bytree': 0.6}, {'subsample': 0.8, 'min_child_weight': 1, 'max_depth': 4, 'gamma': 1, 'colsample_bytree': 1.0}, {'subsample': 1.0, 'min_child_weight': 10, 'max_depth': 4, 'gamma': 1.5, 'colsample_bytree': 0.6}, {'subsample': 1.0, 'min_child_weight': 1, 'max_depth': 5, 'gamma': 5, 'colsample_bytree': 0.6}, {'subsample': 0.8, 'min_child_weight': 1, 'max_depth': 3, 'gamma': 2, 'colsample_bytree': 0.8}, {'subsample': 0.8, 'min_child_weight': 1, 'max_depth': 5, 'gamma': 0.5, 'colsample_bytree': 0.6}, {'subsample': 0.8, 'min_child_weight': 1, 'max_depth': 4, 'gamma': 1.5, 'colsample_bytree': 0.6}], 'split0_test_score': array([0.75734333, 0.78965043, 0.78929122, 0.77842559, 0.78669592,
   0.77856369, 0.7803955 , 0.77733652, 0.78884686, 0.77706318]), 'split1_test_score': array([0.7564997 , 0.78553601, 0.78621578, 0.77250155, 0.78589665,
   0.77237991, 0.77235486, 0.77187115, 0.78573708, 0.77046652]), 'split2_test_score': array([0.75575839, 0.77356843, 0.79002323, 0.77134164, 0.76641651,
   0.76965581, 0.77133806, 0.76749842, 0.79029943, 0.77043647]), 'split3_test_score': array([0.74596394, 0.77188117, 0.76967513, 0.76816388, 0.76832059,
   0.76795065, 0.76942182, 0.76217902, 0.76846871, 0.75720452]), 'split4_test_score': array([0.78099172, 0.80616938, 0.80491224, 0.80371433, 0.81990511,
   0.82052725, 0.80327483, 0.80598102, 0.8171982 , 0.8052647 ]), 'mean_test_score': array([0.75931142, 0.78536108, 0.78802352, 0.7788294 , 0.78544696,
   0.78181546, 0.77935701, 0.77697323, 0.79011006, 0.77608708]), 'std_test_score': array([0.01159822, 0.0124273 , 0.0112318 , 0.01287854, 0.01920727,
   0.01968907, 0.01253142, 0.0153379 , 0.01563886, 0.01595216]), 'rank_test_score': array([10,  4,  2,  7,  3,  5,  6,  8,  1,  9], dtype=int32)}
0.6093407594278569

所以仍然是同样的问题：在交叉验证期间得分约为 0.8，之后为 0.6。我想使用不同的指标。

我找到的解决方案是传入 RandomizedSearchCV: scoring=make_scorer(roc_auc_score)。这解决了在交叉验证中给出相同结果的问题，之后大约为 0.6。

谁能解释问题是什么，因为我仍然不明白？而且我仍然不知道如何使用 optuna 优化来解决它。

score 0 · Accepted Answer

您正在使用model.predict，但 ROC 曲线roc_auc_score需要预测概率（或其他置信度度量，可能）；使用model.predict_proba.

Scikit-learn：roc_auc_score

scikit-learn - roc_auc_score, 'roc_auc', 'auc' 的奇怪行为

1 回答 1

Related

Reference