machine-learning - scikit加权f1分数计算及使用

Question

我对weightedsklearn.metrics.f1_score 的平均值有疑问

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average='weighted', sample_weight=None)

Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

首先，如果有任何参考证明使用加权 F1 是合理的，我只是好奇在哪些情况下我应该使用加权 F1。

其次，我听说 weighted-F1 已弃用，是真的吗？

第三，实际加权 F1 的计算方式，例如

{
    "0": {
        "TP": 2,
        "FP": 1,
        "FN": 0,
        "F1": 0.8
    },
    "1": {
        "TP": 0,
        "FP": 2,
        "FN": 2,
        "F1": -1
    },
    "2": {
        "TP": 1,
        "FP": 1,
        "FN": 2,
        "F1": 0.4
    }
}

如何计算上述示例的加权 F1。我虽然应该是（0.8*2/3 + 0.4*1/3）/3，但我错了。

score 10 · Accepted Answer

首先，如果有任何参考证明使用加权 F1 是合理的，我只是好奇在哪些情况下我应该使用加权 F1。

我没有任何参考资料，但如果您对多标签分类感兴趣，您关心所有类的精度/召回率，那么加权 f1 分数是合适的。如果您有只关心正样本的二元分类，那么它可能不合适。

其次，我听说 weighted-F1 已弃用，是真的吗？

不，weighted-F1 本身并没有被弃用。在 v0.16 中，仅弃用了函数接口的某些方面，然后只是为了在以前模棱两可的情况下使其更加明确。（在 github上的历史讨论或查看源代码并在页面上搜索“已弃用”以查找详细信息。）

第三，实际加权 F1 是如何计算的？

从以下文档f1_score：

``'weighted'``:
  Calculate metrics for each label, and find their average, weighted
  by support (the number of true instances for each label). This
  alters 'macro' to account for label imbalance; it can result in an
  F-score that is not between precision and recall.

所以平均值由support加权，即具有给定标签的样本数。由于您上面的示例数据不包括支持，因此无法根据您列出的信息计算加权 f1 分数。

machine-learning - scikit加权f1分数计算及使用

1 回答 1

Related

Reference