r - 从 dlply 内部调用的 lm 引发“0（非 NA）案例”错误 [r]

Question

我将 dlply() 与一个自定义函数一起使用，该函数平均 lm() 的斜率适合包含一些 NA 值的数据，并且我收到错误“lm.fit 中的错误（x，y，offset = offset，singular.ok = single.ok, ...) : 0 (non-NA) case"

仅当我使用两个关键变量调用 dlply 时才会发生此错误 - 由一个变量分隔可以正常工作。

令人讨厌的是，我无法使用简单的数据集重现错误，因此我已将问题数据集发布到我的保管箱中。

这是代码，在仍然产生错误的同时尽可能最小化：

masterData <- read.csv("http://dl.dropbox.com/u/48901983/SOquestionData.csv", na.strings="#N/A")

workingData <- data.frame(sample = masterData$sample,
                      substrate = masterData$substrate,
                      el1 = masterData$elapsedHr1,
                      F1 = masterData$r1 - masterData$rK)

#This function is trivial as written; in reality it takes the average of many slopes
meanSlope <- function(df) {
     lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help
     slope1 <- lm1$coefficients[2]
     meanSlope <- mean(c(slope1)) 
}

lsGOOD <- dlply(workingData, .(sample), meanSlope) #works fine

lsBAD <- dlply(workingData, .(sample, substrate), meanSlope) #throws error

提前感谢您的任何见解。

score 5 · Accepted Answer

对于您的几个交叉分类，您缺少协变量：

 with(masterData, table(sample, substrate, r1mis = is.na(r1) ) )
#
snipped the nonmissing reports
, , r1mis = TRUE

      substrate
sample 1 2 3 4 5 6 7 8
    3  0 0 0 0 0 0 0 0
    4  0 0 0 0 0 0 0 0
    5  0 0 0 0 0 0 0 0
    6  0 0 0 0 0 0 0 0
    7  0 0 0 0 0 0 3 3
    8  0 0 0 0 0 0 0 3
    9  0 0 0 0 0 0 0 3
    10 0 0 0 0 0 0 0 3
    11 0 0 0 0 0 0 0 3
    12 0 0 0 0 0 0 0 3
    13 0 0 0 0 0 0 0 3
    14 0 0 0 0 0 0 0 3

这将使您跳过此特定数据中数据不足的子集：

meanSlope <- function(df) { if ( sum(!is.na(df$el1)) < 2 ) { return(NA) } else {
     lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help
     slope1 <- lm1$coefficients[2]
     meanSlope <- mean(c(slope1)) }
}

尽管它取决于一个特定协变量中的缺失。更强大的解决方案是用于try捕获错误并转换为 NA。

?try

score 2 · Accepted Answer

根据我的评论：

my.func <- function(df) {
  data.frame(el1=all(is.na(df$el1)), F1=all(is.na(df$F1)))
}

ddply(workingData, .(sample, substrate), my.func)

表明您有许多子集，其中 F1 和 el1 均为 NA。（其实每次一个都是na，另一个也是！）

r - 从 dlply 内部调用的 lm 引发“0（非 NA）案例”错误 [r]

2 回答 2

Related

Reference