python - 使用 sklearn 的 learning_curve() 而不是加权 f1 分数为特定类绘制 f1

Question

在布尔监督分类器上绘制学习曲线时sklearn.model_selection.learning_curve()，默认显示加权 f1 分数。

但我想绘制特定班级的 f1 分数。在这种情况下，正面（又名：1）类。

在下面（来自sklearn.metrics.classification_report）的上下文中，它的绘图avg / total，但我想绘制类的指标1。

阴谋

代码

...
estimator = classifier_class()
cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=0)
train_sizes, train_scores, test_scores = learning_curve(estimator, X_recombined, y_recombined, cv=cv) # n_jobs=n_jobs, train_sizes=train_sizes

train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
test_scores_std = np.std(test_scores, axis=1)
plt.grid()

plt.fill_between(train_sizes, 
                 train_scores_mean - train_scores_std,
                 train_scores_mean + train_scores_std, 
                 alpha=0.1, color="r")

plt.fill_between(train_sizes, 
                 test_scores_mean - test_scores_std,
                 test_scores_mean + test_scores_std, 
                 alpha=0.1, color="g")

plt.plot(train_sizes, train_scores_mean, 'o-', color="r", label="Training score")

plt.plot(train_sizes, test_scores_mean, 'o-', color="g", label="Cross-validation score")

plt.legend(loc="best")

这可能吗？

score 3 · Accepted Answer

对于二元分类

您可以将自定义记分器设置为learning_curve使用scoring参数。来自文档：

评分：字符串，可调用或无，可选，默认值：无

一个字符串（参见模型评估文档）或带有签名 scorer(estimator, X, y) 的 scorer 可调用对象/函数。

此外，sklearn.metrics.f1_score功能文档说：

pos_label : str 或 int，默认为 1

如果 average='binary' 并且数据是二进制的，则要报告的类。如果数据是多类或多标签的，这将被忽略；设置 labels=[pos_label] 和 average != 'binary' 将仅报告该标签的分数。

平均：字符串，[无，“二进制”（默认），“微”，“宏”，“样本”，“加权”]

多类/多标签目标需要此参数。如果没有，则返回每个班级的分数。否则，这将确定对数据执行的平均类型：

'binary'：仅报告由 pos_label 指定的类的结果。这仅适用于目标 (y_{true,pred}) 是二进制的。

因此，您可以这样做：

from sklearn.model_selection import learning_curve
from sklearn.metrics import f1_score, make_scorer

# Custom scorer
target = 0 # class you want to plot
scorer = make_scorer(lambda y_true, y_pred: f1_score(
    y_true, y_pred, 
    labels=None, 
    pos_label=target, 
    average='binary', 
    sample_weight=None))

train_sizes, train_scores, test_scores = learning_curve(
    estimator, 
    X, 
    y, 
    cv=cv,
    scoring=scorer)

python - 使用 sklearn 的 learning_curve() 而不是加权 f1 分数为特定类绘制 f1

1 回答 1

对于二元分类

Related

Reference