r - 如何选择具有某些缺失模式的行？

Question

所以我有一个包含很多缺失值的数据集。我想分离不同缺失模式的数据。我发现包“mice”在总结缺失值模式时非常方便。但是，当我想选择具有某种缺失模式的行时，所选行的数量远远少于缺失模式矩阵所暗示的计数。

我的代码如下。

要获得缺失的模式：

library(mice)
# md.pattern returns a matrix, I convert the matrix into a data frame with the first column as its frequency in the data frame 
pattern = md.pattern(data)
freq = dimnames(pattern)[[1]][-nrow(pattern)] 
pattern = data.frame(pattern[1:nrow(pattern)-1, 1:ncol(pattern)-1], row.names = NULL)
pattern$freq = freq
pattern = pattern[order(freq,decreasing = TRUE),]

但是，当我尝试通过pattern. 计数要小得多。

count = 0
for (i in 1:nrow(data)){
    # match the missingness by the entire row
    if (all(!is.na(data[i, names(data)[1:ncol(pattern)-1]]) == test[1,1:ncol(pattern)-1])){
        count = count +1
  }
}

有谁知道哪里出了问题？谢谢！

数据有很多变量（总共 107 个）和 70000 多个观察值。nhanes此代码在包中的示例数据中运行良好mice。但它只是在我的数据文件中出错了。

例如：

V1 V2 V3 V4 V5
1  NA  3  5  2
NA  3 23  2  9
NA  3 90  7  5
3   3  2 34 NA
3  NA  2  1  3
4  NA  7  3  1

score 3 · Accepted Answer

无论如何，我检查了包中的原始代码。它基于 Schafer 的 prelim.norm 函数，而不是逐行检查缺失值模式。md.patternmice

我发现countinplyr包确实可以解决问题。我编写了这个函数来返回数据集中最n缺少的模式。x是数据框。在我的情况下效果很好。

library(plyr)
miss.pattern <- function(x, topn) {
  # find missingness patterns, 1 represents missing
  r <- 1 * data.frame(is.na(x))
  pattern <- data.frame(count(r))
  pattern <- pattern[order(-pattern$freq),]
  return(pattern[1:topn,])
}

r - 如何选择具有某些缺失模式的行？

1 回答 1

Related

Reference