0

I did a grid search on a logistic regression and set scoring to 'roc_auc'. The grid_clf1.best_score_ gave me an auc of 0.7557. After that I wanted to plot the ROC curve of the best model. The ROC curve I saw had an AUC of 0.50 I do not understand this at all.

I looked into the predicted probabilites and I saw that they were all 0.0 or 1.0. Hence, I think something went wrong here but I cannot find what it is.

My code is as follows for the grid search cv:

clf1 = Pipeline([('RS', RobustScaler()), ('LR', 
     LogisticRegression(random_state=1, solver='saga'))])

params = {'LR__C': np.logspace(-3, 0, 5),
      'LR__penalty': ['l1']}

grid_clf1 = GridSearchCV(clf1, params, scoring='roc_auc', cv = 5, 
      n_jobs=-1)

grid_clf1.fit(X_train, y_train)
grid_clf1.best_estimator_
grid_clf1.best_score_

So this gave an AUC of 0.7557 for the best model. Then if I calculate the AUC for the model myself:

y_pred_proba = grid_clf1.best_estimator_.predict_probas(X_test)[::,1]

print(roc_auc_score(y_test, y_pred_proba))

This gave me an AUC of 0.50.

4

1 回答 1

1

您的示例代码似乎有两个问题:

  1. 您比较不同数据集上的 ROC_AUC 分数。roc_auc_score拟合期间使用训练集,调用时使用测试集
  2. 使用交叉验证进行评分与简单的roc_auc_score函数调用略有不同。它可以扩展为np.mean(cross_val_score(...))

因此,如果考虑到这一点,您将获得相同的得分值。您可以使用colab 笔记本作为参考。

于 2019-04-04T17:16:57.093 回答