r - 使用 glmer/lme4 和 predictInterval 时，您将分组因子设置为什么？

Question

问题：使用多级（混合效应）模型，不确定将分组变量设置为什么，以便使用 merTools 的 predictInterval 函数从 glmer 模型生成测量的组级变量的预测概率。

目标：从“第二级”组级变量中生成一系列值的预测概率和 SE/CI。

寻求：关于如何正确执行此操作的建议或其他建议以生成预测概率和 CI 是来自 glmer 模型的组级别变量的值范围。

library(lme4)
library(merTools)
library(ggplot2)

hier_data <- data_frame(pass = sample(c(0, 1), size = 1000, replace = T),
                        wt = rnorm(1000),
                        ht = sample(1:5, size = 1000, replace = T, prob = c(.1, .1, .6, .1, .1)),
                        school_funding = rnorm(1000),
                        school = rep(c("A", "B", "C", "D", "E"), each = 200))

mod <- glmer(pass ~ wt + ht + school_funding + (1 | school),
             family = binomial("logit"), data = hier_data)

### Without school: error
ndata <- data.frame(wt = median(hier_data$wt),
                    ht = median(hier_data$ht),
                    school_funding = seq(from = min(hier_data$school_funding), to =max(hier_data$school_funding), length.out = 100))

pp <- cbind(ndata, predictInterval(merMod = mod,
                      newdata = ndata,
                      type = "probability"))

### Problem, when adding school variable: which school?
ndata <- data.frame(wt = median(hier_data$wt),
                    ht = median(hier_data$ht),
                    school_funding = seq(from = min(hier_data$school_funding), to =max(hier_data$school_funding), length.out = 100),
                    school = "A")

pp <- cbind(ndata, predictInterval(merMod = mod,
                                   newdata = ndata,
                                   type = "probability"))

ggplot(pp, aes(x = school_funding, y = fit)) +
    geom_point() +
    geom_errorbar(aes(ymin = lwr, ymax = upr))

score 0 · Accepted Answer

看来您要实现的是effects变量的绘图，具有快速的预测间隔。首先请注意，predictInterval 不包含方差参数估计值中的不确定性，theta。如果需要更准确的置信区间，您应该使用其中bootMer描述的函数，?bootMer其中使用自举来估计不确定性。然而，随着模型大小和复杂性的增加，它可能根本不可行。或者，该effects软件包包含说明merMod对象效果的功能（但是文档简直是残暴的）。

一般来说，在说明merMod物体的效果时，一个问题是“哪些效果？”。您是否对边际效应或条件效应（例如随机效应的可变性？）感兴趣。如果您的模型仅包含一阶随机效应（无随机斜率），并且您对固定效应系数的不确定性或对条件均值的影响感兴趣，则可以使用任何学校并指定which = "fixed"为predictInterval

pp <- cbind(ndata, predictInterval(merMod = mod,
                                   newdata = ndata, #<= any school chosen
                                   type = "probability",
                                   which = "fixed"))

请注意，大小将取决于选择的学校和标准模型中的剩余系数，因此不是因果关系。

如果您对边际效应感兴趣，有多种方法可以逼近这一点。最佳方案是引导边际均值的预测值。或者，如果您的分组变量中独立组的数量足够“大”，您可以（也许）平均预测跨组的间隔，如下图所示

newData <- expand.grid(wt = median(hier_data$wt), 
                       ht = median(hier_data$ht),
                       school = levels(hier_data$school),
                       school_funding = seq(from = min(hier_data$school_funding), 
                                            to = max(hier_data$school_funding), 
                                            length.out = 100))
pp <- predictInterval(merMod = mod,
                       newdata = newData,
                       type = "probability")
#Split predictions by every column but school
# And calculate estimated means
predictions <- do.call("rbind", lapply(split(as.data.frame(pp), 
                                             newData[, !names(newData) == "school"]), 
                                       colMeans))
rownames(predictions) <- 1:nrow(predictions)
#create a plot
ggplot(as.data.frame(cbind(predictions, funding = newData$school_funding[newData$school == "A"])), 
       aes(x = funding, y = fit, ymax = upr, ymin = lwr)) + 
    geom_point() +
    geom_errorbar()

对于这个例子，模型通常是奇异的并且包含很少的组，因此结果不太可能是边际效应的一个很好的估计量，但除了从中提取模拟之外predictInterval可能就足够了。在随机效应中具有更多分组级别的模型可能会有所改善。predictInterval似乎没有直接针对这种情况采用一种方法。

查看边际效应的另一种方法是假设形式的边际平均值1/(1+exp(-eta)（通常假设为随机效应的新分组）。这不是直接在predictInterval函数中实现的，而是可以通过从线性预测器中减去随机效应，只估计固定效应的随机性来实现，如下所示：

pp <- predictInterval(merMod = mod,
                      newdata = ndata, #<= any school chosen
                      type = "linear.prediction",
                      which = "fixed")
#remove random effects
pp <- sweep(pp, 1, predict(mod, newdata = ndata, random.only = TRUE), "-")
pp <- 1/(1+exp(-pp))

然后可以使用类似的方法绘制。对于较少的群体，这可能是边际均值的更好预测指标（？，有人可能会在这里纠正我）。

在这两种情况下，添加一点 x-jitter 可能会改善绘图。

在所有情况下，bolker 和其他人对GLMM FAQ的引用中都可能有一些金块。

r - 使用 glmer/lme4 和 predictInterval 时，您将分组因子设置为什么？

1 回答 1

Related

Reference