4
Surrogate splits:
    ##       bmi    < 21.51 to the right, agree=0.858, adj=0.632, (0 split)

I understand that this split send cases to the right child node based on a bmi value of < 21.51 and has a similar split to the primary variable (agree = 0.858) and a decent decrease in node impurity (adj=0.632).

I do not understand the (0 split) piece of the output? Also, if agreement had a value of 1, would this be suspicious?

Thanks!

4

1 回答 1

4

If you have for example 10 missing in your first primary split then rpart will try to classify them using the surrogate splits. If 9 of these are non-missing in your first surrogate variable rpart will use this variable and you will have (9 split) in your rpart output next to this surrogate variable since the variable was used for 9 splits.

If your data is also missing for your surrogate variables then you will have (0 splits) in your output.

I do not know the exact calculation of agreement but if you would have an agreement of 1 then I guess that the surrogate variable results in the same classification as when using the primary variable. This could happen if your surrogate variable is, for example, a monotone transformed version of the primary variable.

于 2014-02-07T08:21:44.277 回答