machine-learning - 学习 Weka - Precision and Recall - .Arff 文件的 Wiki 示例

Question

我是 WEKA 和高级统计的新手，从头开始了解 WEKA 措施。我已经完成了所有 @rushdi-shams 示例，这些都是很好的资源。

在 Wikipedia 上，http ://en.wikipedia.org/wiki/Precision_and_recall示例用一个简单的示例解释了视频软件识别一组 9 只真狗和一些猫中的 7 只狗检测。我完全理解这个例子和召回计算。所以我的第一步，让我们在 Weka 中看看如何使用这些数据进行重现。如何创建这样的 .ARFF 文件？有了这个文件，我有一个错误的混淆矩阵，错误的 Class Recall 准确度不是 1，它应该是 4/9 (0.4444)

@relation 'dogs and cat detection'

@attribute              'realanimal'      {dog,cat}
@attribute              'detected'        {dog,cat}
@attribute              'class'           {correct,wrong}

@data
dog,dog,correct
dog,dog,correct
dog,dog,correct
dog,dog,correct
cat,dog,wrong
cat,dog,wrong
cat,dog,wrong
dog,?,?
dog,?,?
dog,?,?
dog,?,?
dog,?,?
cat,?,?
cat,?,?

输出 Weka（无过滤器）

=== 运行信息 ===

Scheme:weka.classifiers.rules.ZeroR 
Relation:     dogs and cat detection
Instances:    14
Attributes:   3
          realanimal
          detected
          class
Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

ZeroR predicts class value: correct

Time taken to build model: 0 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances           4               57.1429 %
Incorrectly Classified Instances         3               42.8571 %
Kappa statistic                          0     
Mean absolute error                      0.5   
Root mean squared error                  0.5044
Relative absolute error                100      %
Root relative squared error            100      %
Total Number of Instances                7     
Ignored Class Unknown Instances          7     

=== Detailed Accuracy By Class ===

           TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
             1         1          0.571     1         0.727      0.65     correct
             0         0          0         0         0          0.136    wrong
Weighted Avg.    0.571     0.571      0.327     0.571     0.416      0.43 

=== Confusion Matrix ===

 a b   <-- classified as
 4 0 | a = correct
 3 0 | b = wrong

假阴性狗一定有问题，或者我的 ARFF 方法完全错误，我需要另一种属性吗？

谢谢

score 6 · Accepted Answer

让我们从 Precision 和 Recall 的基本定义开始。

Precision = TP/(TP+FP)
Recall = TP/(TP+FN)

哪里TP是真阳性，哪里是FP假阳性，哪里是FN假阴性。

在上面的 dog.arff 文件中，Weka 只考虑了前 7 个元组，它忽略了其余的 7 个。从上面的输出可以看出，它已将所有 7 个元组分类为正确的（4 个正确的元组 + 3 个错误的元组）。

让我们计算正确和错误类别的精度。首先是正确的类：

Prec = 4/(4+3) = 0.571428571
Recall = 4/(4+0) = 1.

对于错误的班级：

Prec = 0/(0+0)= 0
recall =0/(0+3) = 0

machine-learning - 学习 Weka - Precision and Recall - .Arff 文件的 Wiki 示例

1 回答 1

Related

Reference