python - 如何通过我的分类器获得前 5 个主题的精度？

Question

我有 22465 个测试文档，我将它们分类为 88 个不同的主题。我正在使用 predict_proba 来获得前 5 个预测主题。那么如何打印这 5 个主题的精度？

为了准确起见，这就是我正在做的事情：

model1 = LogisticRegression()
model1 = model1.fit(matrix, labels)

y_train_pred = model1.predict_log_proba(matrix_test)
order=np.argsort(y_train_pred, axis=1)
print(order[:,-5:]) #gives top 5 probabilities

n=model1.classes_[order[:, -5:]]

为了准确性

z=0
for x, y in zip(label_tmp_test, n):
    if x in y:
        z=z+1
print(z)
print(z/22465) #This gives me the accuracy by considering top 5 topics

如何以相同的方式找到前 5 个主题的精确度？Scikit 指标拒绝使用

q=model1.predict(mat_tmp_test)
print(metrics.precision_score(n, q))

score 0 · Accepted Answer

在您的方法精度几乎相同 - 您只需关注特定标签（因为精度是每个标签指标），假设您计算标签 L 的精度：

TP = 0.
FP = 0.
for x, y in zip(label_tmp_test, n):

    if x == L: # this is the label we are interested in
        if L in y: # correct prediction is among selected ones
            TP = TP + 1 # we get one more true positive instance

    else: # this is some other label
        if L in y: # if we predicted that this is a particular label
            FP = FP + 1 # we have created another false positive

print(TP / (TP + FP))

现在，如果您需要“一般”精度 - 您通常会平均每个标签的精度。出于显而易见的原因，您需要大量标签才能使这些措施有意义。

python - 如何通过我的分类器获得前 5 个主题的精度？

1 回答 1

Related

Reference