我想使用随机森林方法来估算缺失值。我读过一些论文,声称 MICE 随机森林比参数小鼠表现更好。
就我而言,我已经为默认鼠标运行了一个模型,并得到了结果并与它们一起玩。然而,当我有一个随机森林方法的选项时,我得到了一个错误,我不知道为什么。我见过一些与随机森林和老鼠错误有关的问题,但这些不是我的情况。我的变量有多个 NA。
imp <- mice(data1, m=70, pred=quickpred(data1), method="pmm", seed=71152, printFlag=TRUE)
impRF <- mice(data1, m=70, pred=quickpred(data1), method="rf", seed=71152, printFlag=TRUE)
iter imp variable
1 1 Vac
Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero
任何人都知道为什么我会收到此错误?
编辑
我试图将所有变量更改为数字而不是虚拟变量,它返回了相同的错误和一些警告()
impRF <- mice(data, m=70, pred=quickpred(data), method="rf", seed=71152, printFlag=TRUE)
iter imp variable
1 1 Vac CliForm
Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero
In addition: There were 50 or more warnings (use warnings() to see the first 50)
50: In randomForest.default(x = xobs, y = yobs, ntree = 1, ... :
The response has five or fewer unique values. Are you sure you want to do regression?
编辑1
我只尝试了 5 次插补和较小的数据子集,只有 2000 行,但出现了一些不同的错误:
> imp <- mice(data2, m=5, pred=quickpred(data2), method="rf", seed=71152, printFlag=TRUE)
iter imp variable
1 1 Vac Radio Origin Job Alc Smk Drugs Prison Commu Hmless Symp
Error in randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs in foreign
function call (arg 11)
In addition: Warning messages:
1: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : invalid mtry: reset to within valid range
2: In max(ncat) : no non-missing arguments to max; returning -Inf
3: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs introduced by coercion