1

我想使用随机森林方法来估算缺失值。我读过一些论文,声称 MICE 随机森林比参数小鼠表现更好。

就我而言,我已经为默认鼠标运行了一个模型,并得到了结果并与它们一起玩。然而,当我有一个随机森林方法的选项时,我得到了一个错误,我不知道为什么。我见过一些与随机森林和老鼠错误有关的问题,但这些不是我的情况。我的变量有多个 NA。

imp <- mice(data1, m=70, pred=quickpred(data1), method="pmm", seed=71152, printFlag=TRUE)
impRF <- mice(data1, m=70, pred=quickpred(data1), method="rf", seed=71152, printFlag=TRUE)

iter imp variable
 1   1  Vac
Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero

任何人都知道为什么我会收到此错误?

编辑

我试图将所有变量更改为数字而不是虚拟变量,它返回了相同的错误和一些警告()

impRF <- mice(data, m=70, pred=quickpred(data), method="rf", seed=71152, printFlag=TRUE)

 iter imp variable
   1   1  Vac  CliForm
 Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero
 In addition: There were 50 or more warnings (use warnings() to see the first 50)

 50: In randomForest.default(x = xobs, y = yobs, ntree = 1,  ... :
   The response has five or fewer unique values.  Are you sure you want to do regression?

编辑1

我只尝试了 5 次插补和较小的数据子集,只有 2000 行,但出现了一些不同的错误:

> imp <- mice(data2, m=5, pred=quickpred(data2), method="rf", seed=71152, printFlag=TRUE)

iter imp variable
 1   1  Vac  Radio  Origin  Job  Alc  Smk  Drugs  Prison  Commu  Hmless  Symp
Error in randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs in foreign   
 function call (arg 11)
 In addition: Warning messages:
 1: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : invalid mtry: reset to within valid range
 2: In max(ncat) : no non-missing arguments to max; returning -Inf
 3: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs introduced by coercion
4

1 回答 1

2

当我只有一个完全观察到的变量时,我也遇到了这个错误,我猜这也是你的情况的原因。我的同事 Anoop Shah 为我提供了一个修复程序(如下),van Buuren 教授(老鼠的作者)表示他将把它包含在包的下一次更新中。

在 R 中键入以下内容以使您能够重新定义 rf impute 函数。fixInNamespace("mice.impute.rf", "mice")

然后,要粘贴的更正函数是:

mice.impute.rf <- function (y, ry, x, ntree = 100, ...){
ntree <- max(1, ntree)
xobs <- as.matrix(x[ry, ])
xmis <- as.matrix(x[!ry, ])
yobs <- y[ry]
onetree <- function(xobs, xmis, yobs, ...) {
    fit <- randomForest(x = xobs, y = yobs, ntree = 1, ...)
    leafnr <- predict(object = fit, newdata = xobs, nodes = TRUE)
    nodes <- predict(object = fit, newdata = xmis, nodes = TRUE)
    donor <- lapply(nodes, function(s) yobs[leafnr == s])
    return(donor)
}
forest <- sapply(1:ntree, FUN = function(s) onetree(xobs, 
    xmis, yobs, ...))
impute <- apply(forest, MARGIN = 1, FUN = function(s) sample(unlist(s), 
    1))
return(impute)
}
于 2014-06-18T13:43:29.647 回答