1

我正在尝试将我的分类器结果转换为将实例分类为 0 或 1,而不是给出一个分数(置信度度量?),比如在 0 到 10 之间,我使用的是 RIDOR 分类器,但也可以使用 ClassificationViaRegression、RandomForest 或 AttributeSelectedClassifier很容易,尽管它们也不能很好地分类。

我已将所有可能的内容输出到终端(检查了所有选项),但我在预测中的任何地方都找不到置信度度量。另外我知道这些都没有输出源代码的选项?在这种情况下,我将不得不手动编码分类器。

以下是生成的规则示例:

    class = 2  (40536.0/20268.0)
       Except (fog <= 14.115114) and (polySyllabicWords/Sentence <= 1.973684) and (polySyllabicWords/Sentence <= 1.245) and (Characters/Word > 4.331715) => class = 1  (2309.0/5.0) [1137.0/4.0]
       Except (fog <= 14.115598) and (polySyllabicWords/Sentence <= 1.973684) and (polySyllabicWords/Sentence > 1.514706) => class = 1  (2281.0/0.0) [1112.0/0.0]
       Except (fog <= 14.136126) and (Words/Sentence > 19.651515) and (polySyllableCount <= 10.5) and (polySyllabicWords/Sentence > 2.416667) and (Syllables/Sentence <= 34.875) => class = 1  (601.0/0.0) [303.0/6.0]
       Except (fog <= 14.140863) and (polySyllabicWords/Sentence <= 1.944444) and (polySyllableCount <= 4.5) and (polySyllabicWords/Sentence <= 1.416667) and (wordCount > 29.5) and (Characters/Word <= 4.83156) => class = 1  (333.0/0.0) [152.0/0.0]
       Except (fog <= 14.142217) and (polySyllabicWords/Sentence <= 1.944444) and (polySyllableCount <= 4.5) and (polySyllabicWords/Sentence <= 1.416667) and (numOfChars > 30.5) and (Syllables/Word <= 1.474937) => class = 1  (322.0/0.0) [174.0/4.0]
       Except (fog <= 14.140863) and (polySyllabicWords/Sentence <= 1.75) and (polySyllableCount <= 4.5) => class = 1  (580.0/28.0) [298.0/21.0]
       Except (fog <= 14.141508) and (Syllables/Sentence > 25.585714) and (Words/Sentence > 19.683333) and (sentenceCount <= 4.5) and (polySyllabicWords/Sentence <= 2.291667) and (fog > 12.269468) => class = 1  (434.0/0.0) [202.0/0.0]
       Except (fog <= 14.140863) and (Syllables/Sentence > 25.866071) and (polySyllableCount <= 16.5) and (fog > 12.793102) and (polySyllabicWords/Sentence <= 2.9) and (wordCount <= 59.5) and (Words/Sentence > 16.166667) and (Words/Sentence <= 24.75) => class = 1  (291.0/0.0) [166.0/0.0]
       Except (fog <= 14.140863) and (Syllables/Sentence > 25.585714) and (Words/Sentence > 19.630682) and (polySyllabicWords/Sentence > 2.656863) and (polySyllableCount <= 16.5) and (fog > 13.560337) and (Words/Sentence <= 21.55) and (numOfChars <= 523) => class = 1  (209.0/0.0) [93.0/2.0]
       Except (fog <= 14.147578) and (Syllables/Word <= 1.649029) and (polySyllabicWords/Sentence <= 1.75) and (polySyllabicWords/Sentence > 1.303846) and (polySyllabicWords/Sentence <= 1.422619) and (fog > 9.327132) => class = 1  (183.0/0.0) [64.0/0.0]......

我也不确定第一行是什么意思 (40536/20368) - 这只是意味着将其归类为 2,除非以下规则之一适用?

任何帮助深表感谢!

4

1 回答 1

1

一般来说,从分类器中获得置信度并不是一件容易的事,尤其是如果您希望对其进行校准(例如,作为分类正确的机会呈现)。然而,有几种相对简单的方法可以得到粗略的估计。

对于基于树和规则的分类器,括号中的数字表示存储桶中包含的正确/错误样本的数量。因此,例如,具有 (20,2) 的存储桶意味着有 20 个案例该规则是正确的,而 2 个案例是不正确的,基于训练数据。您可以将此比率用作信心的粗略衡量标准。

使用回归时,您可以让 WEKA 输出分类器(而不仅仅是类)的实际数值结果,并以此为基础衡量置信度。

更一般地,按照文档,您可以使用命令行的 -p 选项(请参阅此处)。但是,我不确定这些数字是如何计算的。

于 2013-03-21T14:24:01.763 回答