weka - 如何解释 Weka Logistic 回归输出？

Question

请帮助解释 Weka 库中 weka.classifiers.functions.Logistic 产生的逻辑回归结果。

我使用来自 Weka 示例的数字数据：

@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

要创建逻辑回归模型，我使用命令： java -cp $WEKA_INS/weka.jar weka.classifiers.functions.Logistic -t $WEKA_INS/data/weather.numeric.arff -T $WEKA_INS/data/weather.numeric.arff - d ./weather.numeric.model.arff

这里三个论点的意思是：

-t <name of training file> : Sets training file.
-T <name of test file> : Sets test file. 
-d <name of output file> : Sets model output file.

运行上述命令会产生以下输出：

Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
              Class
Variable                    yes
===============================
outlook=sunny           -6.4257
outlook=overcast        13.5922
outlook=rainy           -5.6562
temperature             -0.0776
humidity                -0.1556
windy                    3.7317
Intercept                22.234

Odds Ratios...
              Class
Variable                    yes
===============================
outlook=sunny            0.0016
outlook=overcast    799848.4264
outlook=rainy            0.0035
temperature              0.9254
humidity                 0.8559
windy                   41.7508


Time taken to build model: 0.05 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===
Correctly Classified Instances          11               78.5714 %
Incorrectly Classified Instances         3               21.4286 %
Kappa statistic                          0.5532
Mean absolute error                      0.2066
Root mean squared error                  0.3273
Relative absolute error                 44.4963 %
Root relative squared error             68.2597 %
Total Number of Instances               14     

=== Confusion Matrix ===
 a b   <-- classified as
 7 2 | a = yes
 1 4 | b = no

问题：

1) 报告第一部分：

Coefficients...
              Class
Variable                    yes
===============================
outlook=sunny           -6.4257
outlook=overcast        13.5922
outlook=rainy           -5.6562
temperature             -0.0776
humidity                -0.1556
windy                    3.7317
Intercept                22.234

1.1）我是否正确理解“系数”实际上是在将它们加在一起以产生等于“是”的类属性“play”的值之前应用于每个属性的权重？

2) 报告第二部分：

Odds Ratios...
              Class
Variable                    yes
===============================
outlook=sunny            0.0016
outlook=overcast    799848.4264
outlook=rainy            0.0035
temperature              0.9254
humidity                 0.8559
windy                   41.7508

2.1) “优势比”是什么意思？2.2）它们是否也都与等于“yes”的类属性“play”有关？2.3) 为什么“outlook=overcast”的值比“outlook=sunny”的值大很多？

3)

=== Confusion Matrix ===
 a b   <-- classified as
 7 2 | a = yes
 1 4 | b = no

3.1) 混淆矩阵的含义是什么？

非常感谢你的帮助！

score 12 · Accepted Answer

问题：

从下面的评论更新：系数实际上是应用于每个属性的权重，这些权重被插入逻辑函数 1/(1+exp(-weighted_sum)) 以获得概率。请注意，在将它们加在一起之前，将“截距”值添加到总和中而不乘以任何变量。 结果是新实例属于类是的概率（> 0.5 表示是）。
优势比表明该值的变化（或该值的变化）将对预测产生多大的影响。我认为这个链接在解释优势比方面做得很好。outlook=overcast 的值非常大，因为如果outlook 是阴天，那么比赛的可能性非常大。
混淆矩阵只是向您显示有多少测试数据点被正确和错误地分类。在您的示例中，7 个 A 实际上被归类为 A，而 2 个 A 被错误归类为 B。您的问题在这个问题中得到了更彻底的回答：如何阅读 WEKA 中的分类器混淆矩阵。

weka - 如何解释 Weka Logistic 回归输出？

1 回答 1

Related

Reference