r - 如何从决策树计算错误率？

Question

有谁知道如何用 R 计算决策树的错误率？我正在使用该rpart()功能。

score 58 · Accepted Answer

假设您的意思是计算用于拟合模型的样本的错误率，您可以使用printcp(). 例如，使用在线示例，

> library(rpart)
> fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
> printcp(fit)

Classification tree:
rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)

Variables actually used in tree construction:
[1] Age   Start

Root node error: 17/81 = 0.20988

n= 81 

        CP nsplit rel error  xerror    xstd
1 0.176471      0   1.00000 1.00000 0.21559
2 0.019608      1   0.82353 0.82353 0.20018
3 0.010000      4   0.76471 0.82353 0.20018

Root node error用于计算预测性能的两个度量，当考虑和列中显示的值时，rel error并xerror取决于复杂性参数（第一列）：

0.76471 x 0.20988 = 0.1604973 (16.0%) 是重新替换错误率（即在训练样本上计算的错误率）——这大致是
```
class.pred <- table(predict(fit, type="class"), kyphosis$Kyphosis)
1-sum(diag(class.pred))/sum(class.pred)
```
0.82353 x 0.20988 = 0.1728425 (17.2%) 是交叉验证的错误率（使用 10 倍 CV，请参阅；但另请参阅，xval它依赖于这种度量）。该度量是预测准确性的更客观指标。rpart.control()xpred.rpart()plotcp()

请注意，它或多或少与的分类精度一致tree：

> library(tree)
> summary(tree(Kyphosis ~ Age + Number + Start, data=kyphosis))

Classification tree:
tree(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)
Number of terminal nodes:  10 
Residual mean deviance:  0.5809 = 41.24 / 71 
Misclassification error rate: 0.1235 = 10 / 81

其中Misclassification error rate是从训练样本中计算出来的。

r - 如何从决策树计算错误率？

1 回答 1

Related

Reference