r - 如何计算落在树的每个节点中的观察值

Question

我目前正在处理 MMST 包中的葡萄酒数据。我已将整个数据集拆分为训练和测试，并构建了一个类似于以下代码的树：

library("rpart")
library("gbm")
library("randomForest")
library("MMST")

data(wine)
aux <- c(1:178)
train_indis <- sample(aux, 142, replace = FALSE)
test_indis <- setdiff(aux, train_indis)

train <- wine[train_indis,]
test <- wine[test_indis,]    #### divide the dataset into trainning and testing

model.control <- rpart.control(minsplit = 5, xval = 10, cp = 0)
fit_wine <- rpart(class ~ MalicAcid + Ash + AlcAsh + Mg + Phenols + Proa + Color + Hue + OD + Proline, data = train, method = "class", control = model.control)

windows()
plot(fit_wine,branch = 0.5, uniform = T, compress = T,  main = "Full Tree: without pruning")
text(fit_wine, use.n = T, all = T, cex = .6)

我可以得到这样的图像：不修剪的树

每个节点下的数字（例如 Grigolino 下的 0/1/48）是什么意思？如果我想知道每个节点有多少训练和测试样本，我应该在代码中写什么？

score 9 · Accepted Answer

数字表示该节点中每个类的成员数。因此，标签“0 / 1 / 48”告诉我们，类别 1（Barabera，我推断）有 0 个案例，类别 2（Barolo）只有一个示例，类别 3（Grignolino）有 48 个示例。

您可以使用获取有关树和每个节点的详细信息summary(fit_wine)。
有关?summary.rpart更多详细信息，请参阅。

您还可以使用predict()（将调用predict.rpart()）来查看树如何对数据集进行分类。例如，predict(fit_wine, train, type="class")。或将其包裹在桌子上以便于查看table(predict(fit_wine, train, type = "class"),train[,"class"])

如果您特别想知道观察结果落在哪个叶节点上，则此信息存储在fit_wine$where. 对于数据集中的每个案例，包含代表案例所在叶节点fit_wine$where的行号。fit_wine$frame因此，我们可以通过以下方式获取每个案例的叶子信息：

trainingnodes <- rownames(fit_wine$frame)[fit_wine$where]

为了获取测试数据的叶子信息，我曾经运行predict()并type="matrix"推断它。令人困惑的是，这会返回一个矩阵，该矩阵是通过连接预测的类、拟合树中该节点处的类计数以及类概率而产生的。所以对于这个例子：

testresults <- predict(fit_wine, test, type = "matrix")
testresults <- data.frame(testresults)
names(testresults) <- c("ClassGuess","NofClass1onNode", "NofClass2onNode",
     "NofClass3onNode", "PClass1", "PClass2", "PClass2")

由此，我们可以推断出不同的节点，例如，从unique(testresults[,2:4]) 但它是不优雅的。

但是，Yuji 在之前的问题中有一个聪明的技巧。他复制 rpart 对象并用节点替换类，因此运行 predict 返回节点而不是类：

nodes_wine <- fit_wine
nodes_wine$frame$yval = as.numeric(rownames(nodes_wine$frame))
testnodes <- predict(nodes_wine, test, type="vector")

我在这里包含了解决方案，但人们应该支持他。

r - 如何计算落在树的每个节点中的观察值

1 回答 1

Related

Reference