machine-learning - 我有 0% 的正确预测，也许是错误的设置？

Question

我正在使用 mulan 库进行多标签分类。我正在使用的学习者是 RAkEL 学习者。我按照木兰的指示： http: //mulan.sourceforge.net/starting.html

我的标签 xml 文件：

<labels xmlns="http://mulan.sourceforge.net/labels"> 
  <label name="1"/> 
  <label name="2"/> 
  <label name="3"/> 
  <label name="4"/> 
  <label name="5"/> 
</labels>

我的训练数据文件：

@relation predict_label
@attribute 12345 numeric
@attribute A numeric
@attribute B numeric
@attribute C numeric
@attribute D numeric
@attribute E numeric

@attribute 1 {0, 1}
@attribute 2 {0, 1}
@attribute 3 {0, 1}
@attribute 4 {0, 1}
@attribute 5 {0, 1}

@data
2,3,2,2,2,2,1,0,0,0,0

2,2,3,2,2,2,0,1,0,0,0

2,2,2,3,2,2,0,0,1,0,0

2,2,2,2,3,2,0,0,0,1,0

2,2,2,2,2,3,0,0,0,0,1

我的测试数据文件：

@relation catalog_ml
@attribute 12345 numeric
@attribute A numeric
@attribute B numeric
@attribute C numeric
@attribute D numeric
@attribute E numeric

@attribute 1 {0, 1}
@attribute 2 {0, 1}
@attribute 3 {0, 1}
@attribute 4 {0, 1}
@attribute 5 {0, 1}

@data
2,2,2,2,2,3,0,0,0,0,0

我在执行预测后得到的结果：

Bipartion: [false, false, false, false, false] Confidences: [0.0, 0.0, 0.0, 0.0, 0.0] Ranking: [5, 4, 3, 2, 1]Predicted values: null

我的问题是：
1. 有人可以帮我验证我做错了什么吗？
2.据我了解，排名[5,4,3,2,1]是xml标签文件中标签的位置。我的理解正确吗？为什么排名顺序不是从 1 到 5 ...？
3. 预测值是否为空，因为这是一个多标签分类测试？否则哪个学习器不会将预测值返回为空？

非常感谢。任何建议或意见都非常受欢迎。

score 0 · Accepted Answer

我对木兰也很陌生，但我可以说以下。

有人可以帮我验证我做错了什么吗？

你不会特别做错什么。您只是没有为分类器提供足够的信息来对您的测试样本进行分类。我在你的训练集中添加了一些随机线

@relation predict_label
@attribute 12345 numeric
@attribute A numeric
@attribute B numeric
@attribute C numeric
@attribute D numeric
@attribute E numeric

@attribute 1 {0, 1}
@attribute 2 {0, 1}
@attribute 3 {0, 1}
@attribute 4 {0, 1}
@attribute 5 {0, 1}

@data
2,3,2,2,2,2,1,0,0,0,0
2,2,3,2,2,2,0,1,0,0,0
2,2,2,3,2,2,0,0,1,0,0
2,2,2,2,3,2,0,0,0,1,0
2,2,2,2,2,3,0,0,0,0,1
2,2,2,2,2,2,1,0,1,1,0
1,2,3,4,6,7,0,0,0,1,1
5,4,3,2,1,0,1,1,1,1,1
9,8,7,5,4,3,0,1,1,0,0
1,2,3,2,1,0,0,1,1,1,1
1,5,6,8,9,0,1,1,0,0,1

并得到以下结果：

Bipartion: [false, false, false, false, false] Confidences: [0.16666666666666666, 0.0, 0.0, 0.16666666666666666, 0.3333333333333333] Ranking: [3, 5, 4, 2, 1]Predicted values: null

Bipartition 在这里是预测值，Confidence 是关于分类器对他在这里分类的内容的信心程度的值。确实不是很自信。但那是因为“糟糕”的训练数据集。

据我了解，排名 [5, 4, 3, 2, 1] 是 xml 标签文件中标签的位置。我的理解正确吗？为什么排名顺序不是从 1 到 5 ...？

排名仅显示分类器对您的哪个标签最有信心。因为它们都是“0”，所以它们以某种方式“随机”列出，或者以排序函数在没有信息的情况下放置的方式列出。正如您在我的示例中看到的那样，它是按信心排序的。

预测值是否为空，因为这是一个多标签分类测试？否则哪个学习者不会将预测值返回为空？

我其实不知道他们是干什么用的。如果有人对这个问题有答案，我也会很高兴。

编辑

如果将训练集行之一复制到测试测试数据集中，则会得到不同的 Bipartition 值，而不仅仅是 false。

machine-learning - 我有 0% 的正确预测，也许是错误的设置？

1 回答 1

Related

Reference