我正在使用 Python、Pandas 和 NLTK 运行朴素贝叶斯分类器。
我一般了解什么以及如何计算精度和召回率,但我不明白为什么在使用以下命令时会有一对精度和一对召回率
from featx import precision_recall
nb_precisions, nb_recalls = precision_recall(nb_classifier, test_feats)
training_n = int(data_n * 0.25) ### changeable
featuresets = [(first_letter(name), ethnicity) for index, (name, ethnicity, last_name) in df.iterrows()] ### changeable
train_feats, test_feats = featuresets[training_n:], featuresets[:training_n]
nb_classifier = NaiveBayesClassifier.train(train_feats)
# Performance
print "Accuracy: " + str(accuracy(nb_classifier, test_feats))
# Precision and recall
nb_precisions, nb_recalls = precision_recall(nb_classifier, test_feats)
print "Precision +: " + str(nb_precisions[ethnic_target1])
print "Precision -: " + str(nb_precisions[ethnic_non_target])
print "Recall +: " + str(nb_recalls[ethnic_target1])
print "Recall -: " + str(nb_recalls[ethnic_non_target])
Accuracy: 0.99632
Precision +: None
Precision -: 0.99632
Recall +: 0.0
Recall -: 1.0
使用人名(特征)的第一个字母进行分类是中文与非中文。
Gold standard
Chinese non-Chinese
Predicted Chinese A C
non-Chinese B D
我的理解是精度 = A/(A+C) 和召回率 = D/(B+D)