r - 使用 gamm4 选择 beta 和准系列的模型

Question

我有两个符合beta（也称为betar）和Poisson族的响应，我正在研究beta分别使用和准族（计数数据过度分散）拟合加性混合模型。

我知道我可以使用同时接受和准家庭的包中的gamm函数，但是我正在考虑它使用 PQL，并且mgcvbetaAIC报告的对于比较模型没有用 - 这是我分析的主要目标。

在计数响应的情况下，我知道它QAIC已用于对过度分散的混合模型进行排名/比较，但我找不到任何说明它适用于过度分散的 GAMM 的内容。

我理解这些可能是两个问题合二为一，但它们都有一个共同的主题，即大家庭的模型选择，并且可能有不同的解决方案。下面我为每种情况提供可重复的示例。

##generate data
library(gamm4)
library(mgcv)
dat <- gamSim(1,n=400,scale=2)
dat<-subset(dat, select=c(x0,x1,x2,x3,f) )
dat$g <- as.factor(sample(1:20,400,replace=TRUE))#random factor
dat$yb<-runif(400)#yb ranges between 0-1 hence fitted with beta family
dat$f <- dat$f + model.matrix(~ g-1)%*%rnorm(20)*2
dat$yp <- rpois(400,exp(dat$f/7))#y2 is counts hence poisson family

#beta family example with gamm function (this works - however not sure if the subsequent model comparisons are valid!) 
m1b<-    gamm(yb~s(x0)+s(x1)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random=list(g=~1))
m2b<-gamm(yb~s(x1)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random=list(g=~1))
m3b<-gamm(yb~s(x0)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random=list(g=~1))

#AIC to compare models  
AIC(m1b,m2b,m3b)

#try the same using gamm4 (ideally)- it obviously fails with beta family.
m<-gamm4(yb~s(x0)+s(x1)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random = ~ (1|g)) 

##Example with quassi family - yp response is overdispersed count data (may not be overdispered in this example
#example using gamm function
m1p<-gamm(yp~s(x0)+s(x1)+s(x2)+s(x3),family = quasipoisson,data=dat,random=list(g=~1))
m2p<-gamm(yp~s(x1)+s(x2)+s(x3),family = quasipoisson,data=dat,random=list(g=~1))
m3p<-gamm(yp~s(x0)+s(x2)+s(x3),family = quasipoisson,data=dat,random=list(g=~1))

#AIC to compare models
AIC(m1p,m2p,m3p)

#again the example with using gamm4 function will not work as it doesnt accept quassi falimies 
m<-gamm4(yp~s(x0)+s(x1)+s(x2)+s(x3),family = quasipoisson,data=dat,random = ~ (1|g))

score 1 · Accepted Answer

你在这里有很多问题，但我会尽力解决它们。基本上，您希望将参数统计模型与

随机效应 ( nlme, lme4)
指数族分布... ( MASS::glmmPQL, lme4::glmer)
...过度分散...
... 或指数族以外的分布，例如 Beta 分布 ( VGAM, betareg)
加法模型/样条曲线 ( splines) ...
...或惩罚回归样条曲线，它会自动调整平滑项的复杂性
...使用真实似然模型而不是边际或准似然模型（例如 GEE、PQL），因此您可以进行经典推理

上面的每个指定问题都会为模型拟合练习增加 1 个或多个“难度点”……通常，一旦你的分数超过 +3 左右，你就必须找到一种妥协的方法或在一些问题上采取捷径。你想要的东西。你已经正确地识别gamm并gamm4做一些你想做的事情，但你不能得到一切。一些建议：

过度分散

处理过度分散的一种方法是使用观察级随机效应，例如

dat$obs <- factor(seq(nrow(dat)))
m <- gamm4(yp~s(x0)+s(x1)+s(x2)+s(x3),
           family = poisson,data=dat,random = ~ (1|g)+(1|obs))

如果您认为这是有道理的，另一种选择是自己调整过度分散，例如：

m0 <- gamm4(yp~s(x0)+s(x1)+s(x2)+s(x3),family = poisson,data=dat,random = ~ (1|g))

首先计算过度离散：

(phi <- sum(residuals(m0$gam,type="pearson")^2/df.residual(m0$gam)))
## [1] 1.003436

（如果我们重复这个练习，m0$mer我们得到 0.9939696：结果几乎完全等于 1，因为我们首先从泊松分布生成数据......）

(qaic <- -2*logLik(m0$mer)/phi + 2*lme4:::npar.merMod(m0$mer))

注意，我猜测以这种方式从 gamm4 拟合的各个组件构建可能性等是有意义的；使用风险自负

替代分布

glmmADMB和包（都在glmmTMBCRAN 外，但可以通过 Google 找到......）都可以处理混合 Beta 模型。他们不能做惩罚回归样条，但你可以通过splines::ns()or使用常规样条splines::bs()（但你必须决定适当的复杂程度——也许你可以从初步gamm或mgcv拟合中猜测......）

library(glmmADMB)
library(splines)
m3b <- glmmadmb(yb~ns(x0,2)+ns(x1,2)+ns(x2,5)+ns(x3,2)+(1|g),
       family="beta",link="logit",data=dat)

该glmmTMB软件包原则上可以这样做：

library(glmmTMB)
m2b <- glmmTMB(yb~ns(x0,2)+ns(x1,2)+ns(x2,5)+ns(x3,2)+(1|g),
       family=list(family="beta",link="logit"),data=dat)

但该软件包正在开发中，当前的结果集没有意义——所以我现在可能会犹豫是否要使用它。

r - 使用 gamm4 选择 beta 和准系列的模型

1 回答 1

过度分散

替代分布

Related

Reference