0

我正在尝试使用 R 中的条件推理树来获得一种基于使用 ctree 获得的类型/拆分预测的反事实分布。

我正在使用以下代码:

 #Trying firstly the ctree on one country  
 de_fact <- subset(ess, ess$cntry=="IT")

 #Keep only the needed variables
 xvars <- c("fisei", "misei", "edu_father", "edu_mother", "gender", "emprf14", "emprm14")
 yvar <- "isei_respondent"

 de_fact <- de_fact[!is.na(de_fact$isei_respondent), c(yvar, xvars)]

 #Split the data in train and test 
 set.seed(123)
 ind <- sample(2, nrow(de_fact), replace=T, prob=c(0.7, 0.3))
 train <- de_fact[ind==1, ]
 test <- de_fact[ind==2, ]

框架摘要如下:

isei_respondent     fisei           misei                 edu_father 
Min.   :16.00   Min.   :16.00   Min.   :16.00   <= Primary     :800  
1st Qu.:30.00   1st Qu.:26.00   1st Qu.:23.00   Lower II       :315  
Median :40.00   Median :36.00   Median :39.00   Upper II       :173  
Mean   :41.86   Mean   :37.67   Mean   :39.44   Post-II non-III:  0  
3rd Qu.:52.00   3rd Qu.:47.50   3rd Qu.:49.00   Tertiary       : 81  
Max.   :90.00   Max.   :88.00   Max.   :80.00   NA's           : 34  
                NA's   :177     NA's   :959                          
       edu_mother     gender             emprf14   
<= Primary     :926   Female:645   Employee     :857  
Lower II       :272   Male  :758   Self-employed:484  
Upper II       :148                Not work     :  9  
Post-II non-III:  0                Dead/Absent  : 21  
Tertiary       : 26                NA's         : 32  
NA's           : 31                                   
                                                   
      emprm14   
Employee     :297  
Self-employed:186  
Not work     :867  
Dead/Absent  : 21  
NA's         : 32   

我正在将 ctree 拟合到火车数据上,并对测试进行如下预测:

 try <- ctree(isei_respondent ~ ., data=train, 
         control=ctree_control(maxsurrogate=3, mincriterion = 0.99))
 try 
 info_node(node_party(try))
 predict(try, newdata = test)

但是,对于国家 IT,我的预测长度和测试数据的长度不匹配。具体来说,对于 405 个观察的测试数据,我只有 108 个预测。关于我做错了什么的任何想法以及这种不匹配的原因是什么?

感谢您的支持!

4

0 回答 0