我一直在通过 Weka 运行数据集,应用 NB。我坚持以下问题:在分析它时,我注意到属性部分中的总数与日志中出现的实例总数之间的差异。
如果对“a0”属性求和,您会注意到 Weka 点 1044 个实例。如果选中“Instances”,则为 1036。
数据集实际上包含 1036 个实例。
有人对此有解释吗?谢谢。
这是一个日志粘贴:
=== Run information ===
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: teste.carro
Instances: 1036
Attributes: 7
a0
a1
a2
a3
a4
a5
class
Test mode: evaluate on training data
=== Classifier model (full training set) ===
Naive Bayes Classifier
Class
Attribute 0 1
(0.5) (0.5)
===========================
a0
1 105.0 175.0
2 112.0 165.0
3 153.0 109.0
4 152.0 73.0
[total] 522.0 522.0
a1
1 101.0 165.0
2 123.0 165.0
3 136.0 119.0
4 162.0 73.0
[total] 522.0 522.0
a2
1 150.0 107.0
2 122.0 133.0
3 121.0 141.0
4 129.0 141.0
[total] 522.0 522.0
a3
1 247.0 1.0
2 134.0 265.0
3 140.0 255.0
[total] 521.0 521.0
a4
1 189.0 127.0
2 177.0 185.0
3 155.0 209.0
[total] 521.0 521.0
a5
1 244.0 1.0
2 160.0 220.0
3 117.0 300.0
[total] 521.0 521.0
Time taken to build model: 0 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0.01 seconds
=== Summary ===
Correctly Classified Instances 957 92.3745 %
Incorrectly Classified Instances 79 7.6255 %
Kappa statistic 0.8475
Mean absolute error 0.1564
Root mean squared error 0.2398
Relative absolute error 31.2731 %
Root relative squared error 47.9651 %
Coverage of cases (0.95 level) 100 %
Mean rel. region size (0.95 level) 80.2124 %
Total Number of Instances 1036
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0,847 0,000 1,000 0,847 0,917 0,858 0,989 0,991 0
1,000 0,153 0,868 1,000 0,929 0,858 0,989 0,988 1
Weighted Avg. 0,924 0,076 0,934 0,924 0,923 0,858 0,989 0,989
=== Confusion Matrix ===
a b <-- classified as
439 79 | a = 0
0 518 | b = 1