2

I have a matrix that is half-sparse. Half of all cells are blank (na) so when I try to run the 'mice' it tries to work on all of them. I'm only interested in a subset.

Question: In the following code, how do I make "mice" only operate on the first two columns? Is there a clean way to do this using row-lag or row-lead, so that the content of the previous row can help patch holes in the current row?

set.seed(1)

#domain
x <- seq(from=0,to=10,length.out=1000)

#ranges
y <- sin(x) +sin(x/2) + rnorm(n = length(x))
y2 <- sin(x) +sin(x/2) + rnorm(n = length(x))

#kill 50% of cells
idx_na1 <- sample(x=1:length(x),size = length(x)/2)
y[idx_na1] <- NA

#kill more cells
idx_na2 <- sample(x=1:length(x),size = length(x)/2)
y2[idx_na2] <- NA

#assemble base data
my_data <- data.frame(x,y,y2)

#make the rest of the data
for (i in 3:50){


     my_data[,i] <- rnorm(n = length(x))
     idx_na2 <- sample(x=1:length(x),size = length(x)/2)
     my_data[idx_na2,i] <- NA

}

#imputation
est <- mice(my_data)

data2 <- complete(est)

str(data2[,1:3])

Places that I have looked for answers:

4

3 回答 3

8

我认为您要查找的内容可以通过修改鼠标功能的参数“where”来完成。参数“where”等于一个矩阵(或数据框),其大小与您执行插补的数据集相同。默认情况下,“where”参数等于 is.na(data):当数据集中缺少该值时,单元格等于“TRUE”,否则等于“FALSE”。这意味着默认情况下,将估算数据集中的每个缺失值。现在,如果您想更改此设置并仅估算数据集中特定列(在我的示例第 2 列中)中的值,您可以执行以下操作:

# Define arbitrary matrix with TRUE values when data is missing and FALSE otherwise
A <- is.na(data)
# Replace all the other columns which are not the one you want to impute (let say column 2)
A[,-2] <- FALSE 
# Run the mice function
imputed_data <- mice(data, where = A)
于 2018-08-03T18:39:24.877 回答
3

而不是where参数,一种更快的方法可能是使用method参数。您可以将此参数设置""为要跳过的列/变量。缺点是无法自动确定该方法。所以:

imp <- mice(data,
            method = ifelse(colnames(data) == "your_var", "logreg", ""))

但是您可以从文档中获取默认方法:

defaultMethod

...默认情况下,该方法使用pmm预测均值匹配(数字数据)logreg、逻辑回归插补(二元数据、具有 2 个水平的因子)polyreg、无序分类数据的多头回归插补(因子 > 2 水平)polr、比例优势模型(有序,> 2 个级别)。

于 2020-02-06T14:17:05.127 回答
1

你的问题对我来说并不完全清楚。你是说你希望只对两列进行操作吗?在那种情况下mice(my_data[,1:2])会起作用。或者您想使用所有数据但只填写某些列的缺失值?为此,我只需按照以下几行创建一个指标矩阵:

isNA <- data.frame(apply(my_data, 2, is.na))
est <- mice(my_data)

mapply(function(x, isna) {
  x[isNA == 1] <- NA
  return(x)
}, <each MI mice return object column-wise>,  isNA)

对于您的最后一个问题,“我可以mice用于滚动数据插补吗?” 我相信答案是否定的。但是您应该仔细检查文档。

于 2017-02-10T21:51:37.067 回答