我正在尝试使用 R 中的条件推理树来获得一种基于使用 ctree 获得的类型/拆分预测的反事实分布。
我正在使用以下代码:
#Trying firstly the ctree on one country
de_fact <- subset(ess, ess$cntry=="IT")
#Keep only the needed variables
xvars <- c("fisei", "misei", "edu_father", "edu_mother", "gender", "emprf14", "emprm14")
yvar <- "isei_respondent"
de_fact <- de_fact[!is.na(de_fact$isei_respondent), c(yvar, xvars)]
#Split the data in train and test
set.seed(123)
ind <- sample(2, nrow(de_fact), replace=T, prob=c(0.7, 0.3))
train <- de_fact[ind==1, ]
test <- de_fact[ind==2, ]
框架摘要如下:
isei_respondent fisei misei edu_father
Min. :16.00 Min. :16.00 Min. :16.00 <= Primary :800
1st Qu.:30.00 1st Qu.:26.00 1st Qu.:23.00 Lower II :315
Median :40.00 Median :36.00 Median :39.00 Upper II :173
Mean :41.86 Mean :37.67 Mean :39.44 Post-II non-III: 0
3rd Qu.:52.00 3rd Qu.:47.50 3rd Qu.:49.00 Tertiary : 81
Max. :90.00 Max. :88.00 Max. :80.00 NA's : 34
NA's :177 NA's :959
edu_mother gender emprf14
<= Primary :926 Female:645 Employee :857
Lower II :272 Male :758 Self-employed:484
Upper II :148 Not work : 9
Post-II non-III: 0 Dead/Absent : 21
Tertiary : 26 NA's : 32
NA's : 31
emprm14
Employee :297
Self-employed:186
Not work :867
Dead/Absent : 21
NA's : 32
我正在将 ctree 拟合到火车数据上,并对测试进行如下预测:
try <- ctree(isei_respondent ~ ., data=train,
control=ctree_control(maxsurrogate=3, mincriterion = 0.99))
try
info_node(node_party(try))
predict(try, newdata = test)
但是,对于国家 IT,我的预测长度和测试数据的长度不匹配。具体来说,对于 405 个观察的测试数据,我只有 108 个预测。关于我做错了什么的任何想法以及这种不匹配的原因是什么?
感谢您的支持!