rpart()
完全能够处理超过 2 个类别的响应。尝试:
require(rpart)
mod <- rpart(Species ~ ., data = iris)
mod
plot(mod)
text(mod)
当使用默认设置运行时,它会生成一个具有 3 个终端节点的树:
R> mod
n= 150
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)
2) Petal.Length< 2.45 50 0 setosa (1.00000000 0.00000000 0.00000000) *
3) Petal.Length>=2.45 100 50 versicolor (0.00000000 0.50000000 0.50000000)
6) Petal.Width< 1.75 54 5 versicolor (0.00000000 0.90740741 0.09259259) *
7) Petal.Width>=1.75 46 1 virginica (0.00000000 0.02173913 0.97826087) *
The recursive partitioning algorithm will stop building a tree when certain stopping rules are met (there is no point splitting if a node is already pure [of a single class], and by default a node has to have 20+ observations for it to be split, and will also stop splitting a given node if it has less than 7 observations, or if no further splits will improve the lack of fit by a factor of 0.01, and so on). Some of these can be controlled from the rpart.control()
function.
From what limited information you have given us, I can only conclude that these defaults are inappropriate for your data set and you should adjust them accordingly, e.g.:
ctrl <- rpart.control(minsplit = 2, minbucket = 1, cp = 0.00001)
mod2 <- rpart(Species ~ ., data = iris, control = ctrl)
mod2
plot(mod2)
text(mod2)
Which for this exmaple data set produces a much larger tree:
R> mod2
n= 150
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)
2) Petal.Length< 2.45 50 0 setosa (1.00000000 0.00000000 0.00000000) *
3) Petal.Length>=2.45 100 50 versicolor (0.00000000 0.50000000 0.50000000)
6) Petal.Width< 1.75 54 5 versicolor (0.00000000 0.90740741 0.09259259)
12) Petal.Length< 4.95 48 1 versicolor (0.00000000 0.97916667 0.02083333)
24) Petal.Width< 1.65 47 0 versicolor (0.00000000 1.00000000 0.00000000) *
25) Petal.Width>=1.65 1 0 virginica (0.00000000 0.00000000 1.00000000) *
13) Petal.Length>=4.95 6 2 virginica (0.00000000 0.33333333 0.66666667)
26) Petal.Width>=1.55 3 1 versicolor (0.00000000 0.66666667 0.33333333)
52) Sepal.Length< 6.95 2 0 versicolor (0.00000000 1.00000000 0.00000000) *
53) Sepal.Length>=6.95 1 0 virginica (0.00000000 0.00000000 1.00000000) *
27) Petal.Width< 1.55 3 0 virginica (0.00000000 0.00000000 1.00000000) *
7) Petal.Width>=1.75 46 1 virginica (0.00000000 0.02173913 0.97826087)
14) Petal.Length< 4.85 3 1 virginica (0.00000000 0.33333333 0.66666667)
28) Sepal.Length< 5.95 1 0 versicolor (0.00000000 1.00000000 0.00000000) *
29) Sepal.Length>=5.95 2 0 virginica (0.00000000 0.00000000 1.00000000) *
15) Petal.Length>=4.85 43 0 virginica (0.00000000 0.00000000 1.00000000) *
but is most likely highly over-fitted to the data.
That said, there are, of course, other packages that can fit trees to data sets that like rpart()
can handle response with more than two levels. The main ones are listed on the Machine Learning & Statistical Learning Task View on CRAN, which you should consult. One such package is party.