我有一个使用 FastText 的多标签分类任务。我必须为它计算混淆矩阵。我已经解决了计算单个标签的 CM 的问题。这是它的 Python 脚本:
import argparse
import numpy as np
from sklearn.metrics import confusion_matrix
def parse_labels(path):
with open(path, 'r') as f:
return np.array(list(map(lambda x: x[9:], f.read().split())))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Display confusion matrix.')
parser.add_argument('test', help='Path to test labels')
parser.add_argument('predict', help='Path to predictions')
args = parser.parse_args()
test_labels = parse_labels(args.test)
print("Test labels:%d (sample)\n%s" % (len(test_labels),test_labels[:1]) )
pred_labels = parse_labels(args.predict)
print("Predicted labels:%d (sample)\n%s" % (len(pred_labels),pred_labels[:1]) )
eq = test_labels == pred_labels
print("Accuracy: " + str(eq.sum() / len(test_labels)))
print(confusion_matrix(test_labels, pred_labels))
这将输出类似
Test labels:539328 (sample)
['pop']
Predicted labels:539328 (sample)
['unknown']
Accuracy: 0.17639914857
[[6126 0 0 ..., 0 0 0]
[ 55 0 0 ..., 0 0 0]
[ 6 0 0 ..., 0 0 0]
...,
[ 0 0 0 ..., 0 0 0]
[ 0 0 0 ..., 0 0 0]
[ 0 0 0 ..., 0 0 0]]
问题是在多标签任务的特定情况下,这无法正常工作,因为我正在计算准确性
eq = test_labels == pred_labels
eq.sum() / len(test_labels)
当文件具有一列/标签时可以正常工作,但当 FastText 的预测输出是两列/标签文件时则不行。