pandas - 如何对多类数据进行交叉验证？

Question

我能够使用以下方法对二进制数据进行交叉验证，但它似乎不适用于多类数据：

> cross_validation.cross_val_score(alg, X, y, cv=cv_folds, scoring='roc_auc')

/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/metrics/scorer.py in __call__(self, clf, X, y, sample_weight)
    169         y_type = type_of_target(y)
    170         if y_type not in ("binary", "multilabel-indicator"):
--> 171             raise ValueError("{0} format is not supported".format(y_type))
    172 
    173         if is_regressor(clf):

ValueError: multiclass format is not supported

> y.head()

0    10
1     6
2    12
3     6
4    10
Name: rank, dtype: int64

> type(y)

pandas.core.series.Series

我也尝试更改roc_auc为f1但仍然有错误：

/home/ubuntu/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight)
   1016         else:
   1017             raise ValueError("Target is %s but average='binary'. Please "
-> 1018                              "choose another average setting." % y_type)
   1019     elif pos_label not in (None, 1):
   1020         warnings.warn("Note that pos_label (set to %r) is ignored when "

ValueError: Target is multiclass but average='binary'. Please choose another average setting.

有什么方法可以用来对此类数据进行交叉验证吗？

score 2 · Accepted Answer

正如Vivek Kumar在评论中指出的那样，sklearn 指标支持F1 分数和ROC 计算的多类平均，尽管在数据不平衡时存在一些限制。因此，您可以使用相应的average参数手动构建记分器或使用预定义的参数之一（例如：'f1_micro'、'f1_macro'、'f1_weighted'）。

如果需要多个分数，则不要cross_val_score使用cross_validate（从模块中的 sklearn 0.19 开始可用sklearn.model_selection）。

pandas - 如何对多类数据进行交叉验证？

1 回答 1

Related

Reference