machine-learning - 在 weka 上使用 j48 进行分类

Question

我将这些数据作为训练集并将 PlayTennise 属性作为目标。

@relation Weka

@attribute Day {D1,D2,D3,D4,D5,D6,D7,D8,D9,D10,D11,D12,D13,D14}
@attribute Outlook {Sunny,Overcast,Rain}
@attribute Temperature {Hot,Mild,Cool}
@attribute Humidity {High,Normal}
@attribute Wind {Weak,Strong}
@attribute PlayTennis {No,Yes}

@data
D1,Sunny,Hot,High,Weak,No
D2,Sunny,Hot,High,Strong,No
D3,Overcast,Hot,High,Weak,Yes
D4,Rain,Mild,High,Weak,Yes
D5,Rain,Cool,Normal,Weak,Yes
D6,Rain,Cool,Normal,Strong,No
D7,Overcast,Cool,Normal,Strong,Yes
D8,Sunny,Mild,High,Weak,No
D9,Sunny,Cool,Normal,Weak,Yes
D10,Rain,Mild,Normal,Weak,Yes
D11,Sunny,Mild,Normal,Strong,Yes
D12,Overcast,Mild,High,Strong,Yes
D13,Overcast,Hot,Normal,Weak,Yes
D14,Rain,Mild,High,Strong,No

我也给 weka 提供的测试集的数据，但只需将目标 [Yes, No] 转换为“？”。这样：

@relation Weka2

@attribute Day {D1,D2,D3,D4,D5,D6,D7,D8,D9,D10,D11,D12,D13,D14}
@attribute Outlook {Sunny,Overcast,Rain}
@attribute Temperature {Hot,Mild,Cool}
@attribute Humidity {High,Normal}
@attribute Wind {Weak,Strong}
@attribute PlayTennis {No,Yes}

@data
D1,Sunny,Hot,High,Weak,?
D2,Sunny,Hot,High,Strong,?
D3,Overcast,Hot,High,Weak,?
D4,Rain,Mild,High,Weak,?
D5,Rain,Cool,Normal,Weak,?
D6,Rain,Cool,Normal,Strong,?
D7,Overcast,Cool,Normal,Strong,?
D8,Sunny,Mild,High,Weak,?
D9,Sunny,Cool,Normal,Weak,?
D10,Rain,Mild,Normal,Weak,?
D11,Sunny,Mild,Normal,Strong,?
D12,Overcast,Mild,High,Strong,?
D13,Overcast,Hot,Normal,Weak,?
D14,Rain,Mild,High,Strong,?

点击开始，但结果是这样的：

=== Run information ===

Scheme:       weka.classifiers.trees.J48 -C 0.25 -M 2
Relation:     Weka
Instances:    14
Attributes:   6
              Day
              Outlook
              Temperature
              Humidity
              Wind
              PlayTennis
Test mode:    user supplied test set:  size unknown     (reading incrementally)

=== Classifier model (full training set) ===

J48 pruned tree
------------------

Outlook = Sunny
|   Humidity = High: No (3.0)
|   Humidity = Normal: Yes (2.0)
Outlook = Overcast: Yes (4.0)
Outlook = Rain
|   Wind = Weak: Yes (3.0)
|   Wind = Strong: No (2.0)

Number of Leaves  :     5

Size of the tree :  8


Time taken to build model: 0 seconds

=== Evaluation on test set ===

Time taken to test model on supplied test set: 0 seconds

=== Summary ===

Total Number of Instances                0     
Ignored Class Unknown Instances                  7     

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.000    0.000    0.000      0.000    0.000      0.000    ?         ?         No
                 0.000    0.000    0.000      0.000    0.000      0.000    ?         ?         Yes
Weighted Avg.    NaN      NaN      NaN        NaN      NaN        NaN      NaN       NaN       

=== Confusion Matrix ===

 a b   <-- classified as
 0 0 | a = No
 0 0 | b = Yes

它说有“Ignored Class Unknown Instances = 14”和“Total Number of Instances = 0”

我不明白我必须做什么？

请帮我？

score 1 · Accepted Answer

测试数据集应保留标记为“是”或“否”的目标变量。

这将使 Weka 能够评估其预测的质量。没有目标标签，Weka 不知道预测是否正确，因此它会在评估中忽略这些情况。

如果您只是对预测感兴趣，您仍然可以使用未标记的数据。

例如，如果使用 GUI：

加载您的训练数据并选择分类选项卡。

按测试选项框中的“更多选项”按钮。

现在在“输出预测”旁边打勾。

提供您未标记的测试数据并按下开始按钮

这会产生一个带有对看似被忽略的实例的预测的输出（下面是相关输出的示例）。

=== 对测试拆分的预测 ===  
inst#, 实际的, 预测的, 误差, 概率分布
     1 ? 2：无 + 0 *1    
     2 ? 2：无 + 0 *1    
     3 ? 1：是 + *1 0    
     4 ? 1：是 + *1 0    
     5 ? 1：是 + *1 0    
     6 ? 2：无 + 0 *1    
     7 ? 1：是 + *1 0

machine-learning - 在 weka 上使用 j48 进行分类

1 回答 1

Related

Reference