2

I'm relatively new to R and absolutely new to the forum, so I may be unaware of some posting rules (be kind).

The issue i have relates to the as.mids() function in the mice package by Stef van Buuren: when using it the data in the new mids object is fine, but the number of imputations in the attributes is the original + 1.

Example (making use of the original as.mids() example and data in the mice package):

imp <- mice(boys, print = FALSE, maxit = 1) #using the default number of imputations m = 5
X <- complete(imp, action = "long", include = TRUE)
test <- as.mids(X)

The created object now shows: number of imputations = 6 (instead of 5), somehow also influencing analysis as demonstrated by the difference between

fit <- with(imp, lm(bmi ~ age))
round(summary(pool(fit)), 2)
fit2 <- with(test, lm(bmi ~ age))
round(summary(pool(fit2)), 2).

When looking through the code of as.mids() one minor change seems to solve this, but my R knowledge really requires a second opinion.

Original as.mids() code as below:

function (data, .imp = 1, .id = 2) 
{
  ini <- mice(data[data[, .imp] == 0, -c(.imp, .id)], m = max(as.numeric(data[, 
                                                                              .imp])), maxit = 0)
  names <- names(ini$imp)
  if (!is.null(.id)) {
    rownames(ini$data) <- data[data[, .imp] == 0, .id]
  }
  for (i in 1:length(names)) {
    for (m in 1:(max(as.numeric(data[, .imp])) - 1)) {
      if (!is.null(ini$imp[[i]])) {
        indic <- data[, .imp] == m & is.na(data[data[, 
                                                     .imp] == 0, names[i]])
        ini$imp[[names[i]]][m] <- data[indic, names[i]]
      }
    }
  }
  return(ini)
}

Now modifying the m parameter in the call for mice in the definition of ini (3 row) seems to solve it (only acounting for the fact that max(as.numeric()) gives the number of levels, thus including the original data, which is probably not ment):

as.mids.mod <- function(data, .imp = 1, .id = 2){
  ini <- mice(data[data[, .imp] == 0, -c(.imp, .id)], m = (max(as.numeric(data[, .imp])) -1), maxit = 0)
  names <- names(ini$imp)
  if (!is.null(.id)) {
    rownames(ini$data) <- data[data[, .imp] == 0, .id]
  }
  for (i in 1:length(names)) {
    for (m in 1:(max(as.numeric(data[, .imp])) - 1)) {
      if (!is.null(ini$imp[[i]])) {
        indic <- data[, .imp] == m & is.na(data[data[, .imp] == 0, names[i]])
        ini$imp[[names[i]]][m] <- data[indic, names[i]]
      }
    }
  }
  return(ini)

Using as.mids.mod in the example now gives similar analysis results:

imp <- mice(boys, print = FALSE, maxit = 1)
data <- complete(imp, action = "long", include = TRUE)
test <- as.mids(data)
test2 <- as.mids.mod(data)

fit <- with(imp, lm(bmi ~ age))
round(summary(pool(fit)), 2)
fit3 <- with(test2, lm(bmi ~ age))
round(summary(pool(fit3)), 2)

Am I doing something wrong in my use of the function and/or problem solving or should the as.mids() function be very slightly modified?

4

1 回答 1

2

感谢您提出这个问题。以下工作示例中的函数as.mids2()产生所需的mids对象。

as.mids2 <- function(data2, .imp=1, .id=2){
    ini <- mice(data2[data2[, .imp] == 0, -   c(.imp, .id)], m =   max(as.numeric(levels(data2[,  .imp]))), maxit=0)
    names  <- names(ini$imp)
    if (!is.null(.id)){
        rownames(ini$data) <- data2[data2[, .imp] == 0, .id]
    }
    for (i in 1:length(names)){
        for(m in 1:(max(as.numeric(levels(data2[,  .imp]))))){
            if(!is.null(ini$imp[[i]])){
                 indic <- data2[, .imp] == m &  is.na(data2[data2[, .imp]==0, names[i]])
                ini$imp[[names[i]]][m] <- data2[indic, names[i]]
            }
        } 
    }
    return(ini)
}

require(mice)
imp <- mice(nhanes)
com <- complete(imp, "long", include = TRUE)

imp2 <- as.mids2(com)
com2 <- complete(imp2, "long", include = TRUE)
all(na.omit(com == com2))
于 2014-09-19T09:04:15.583 回答