r - 包括泊松模型中的偏移量是候选模型的两倍

Question

修改问题和代码 (2020-07-10)

随着分析的进行，这个问题已经进行了几次编辑。我在顶部包含了最新信息，但对于那些被困在其他步骤的人，请参阅下面的信息。

回顾：我正在对几个连续变量（A、B 和 C）、一个分类变量（YN）和一个随机截距（ID）之间的关系进行建模。我已经包含了一个偏移量 (T)，因为计数数据是在不同持续时间的行为观察期间收集的。还有一个基于随机截距的零通胀术语。我正在比较该模型的所有子集以确定最适合的模型。

目前，当我尝试比较所有子集时，疏通函数会引发错误。

#simulated data
df <- data.frame(
  count = sample(size = 300, c(0:6), replace = TRUE, prob = c(0.7, 0.15, 0.05, 0.04, 0.03, 0.02, 0.01)),
  A = sample(size = 300, x = 24:65, replace = TRUE),
  B = runif(n = 300, min = 7, max = 19),
  C = rbeta(n = 300, shape1 = 1, shape2 = 25)*100,
  YN = sample(size = 300, c("Y", "N"), replace = TRUE, prob = c(0.2, 0.8)),
  T = runif(n = 300, min = 10, max = 20),
  ID = sample(size = 300, letters, replace = TRUE)
)
  
#standardize explanatory variables
dfSTD <- stdize(df, 
                omit.cols = c("count", "YN", "T", "ID"), 
                center = TRUE, scale = TRUE) 

#specify full model with standardized covariates for analysis
full <- stdizeFit(glmmTMB(count ~ z.A + z.B + z.C + YN + offset(log(T)) + (1|ID),
                          zi = ~ (1|ID),
                          family = poisson(),
                          data = dfSTD,
                          na.action = "na.fail"),
                  data = dfSTD, 
                  which = "formula", 
                  evaluate = FALSE)

#compare all possible model subsets
all <- dredge(full, 
              beta = "sd", 
              fixed = "cond(offset(log(T)))")

Error in nobs.default(global.model) : no 'nobs' method is available

我正在使用 R 版本 3.6.1。

我正在对几个连续变量（A、B 和 C）、一个分类变量（YN）和一个随机截距（ID）之间的关系进行建模。我已经包含了一个偏移量 (T)，因为计数数据是在不同持续时间的行为观察期间收集的。还有一个基于随机截距的零通胀术语。

我正在比较该模型的所有子集以确定最适合的模型。

我的代码如下所示：

full <- glmmTMB(count ~ A + B + C + YN + offset(log(T)) + (1|ID), 
                zi = ~ (1|ID),
                family = poisson(),
                data = df, 
                na.action = "na.fail")
View(dredge(full, beta = sd))

但是当我查看表格时，似乎模型是重复的，即有和没有偏移量。该表包括 32 个模型，当我省略偏移量时，只有 16 个模型。加上系数非常相似。（显示此表的子集）

我已经尝试更改我的代码，以便始终包含如下所示的偏移量，但这会导致警告。

    full <- glmmTMB(count ~ A + B + C + YN + (1|ID), 
                    offset = log(T), 
                    zi = ~ (1|ID),
                    family = poisson(),
                    data = df, 
                    na.action = "na.fail")
Warning messages:
1: In (function (start, objective, gradient = NULL, hessian = NULL,  :
  NA/NaN function evaluation
repeated 9 times

编辑 2020/06/16

当偏移量作为一个术语包含时，我已经用完整的模型摘要更新了这篇文章：

 Family: poisson  ( log )
Formula:          count ~ A + B + C + YN + offset(log(T)) +  
    (1 | ID)
Zero inflation:      ~(1 | ID)
Data: df

     AIC      BIC   logLik deviance df.resid 
   286.4    317.4   -135.2    270.4      344 

Random effects:

Conditional model:
 Groups Name        Variance Std.Dev.
 ID   (Intercept) 0.1006   0.3172  
Number of obs: 352, groups:  name, 64

Zero-inflation model:
 Groups Name        Variance  Std.Dev. 
 ID   (Intercept) 2.089e-08 0.0001445
Number of obs: 352, groups:  name, 64

Conditional model:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.582134   0.382926 -11.966  < 2e-16 ***
A              0.003517   0.170809   0.021  0.98357    
B              0.356596   0.162703   2.192  0.02840 *  
C              0.594737   0.095198   6.247 4.17e-10 ***
YN             -1.397989   0.538510  -2.596  0.00943 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Zero-inflation model:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.4744     0.7323  -0.648    0.517

当偏移量作为公式之外的单独参数包含在内时：

Family: poisson  ( log )
Formula:          count ~ A + B + C + YN + (1 | ID) #NOTE: offset isn't listed here despite being included when I specified the model above
Zero inflation:      ~(1 | ID)
Data: df
 Offset: log(duration)

     AIC      BIC   logLik deviance df.resid 
   286.4    317.4   -135.2    270.4      344 

Random effects:

Conditional model:
 Groups Name        Variance Std.Dev.
 ID   (Intercept) 0.1006   0.3172  
Number of obs: 352, groups:  name, 64

Zero-inflation model:
 Groups Name        Variance  Std.Dev. 
 ID   (Intercept) 2.089e-08 0.0001445
Number of obs: 352, groups:  name, 64

Conditional model:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.582134   0.382926 -11.966  < 2e-16 ***
A              0.003517   0.170809   0.021  0.98357    
B              0.356596   0.162703   2.192  0.02840 *  
C              0.594737   0.095198   6.247 4.17e-10 ***
YN             -1.397989   0.538510  -2.596  0.00943 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Zero-inflation model:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.4744     0.7323  -0.648    0.517

编辑 2020/06/24

修改后的疏浚功能解决了在所有模型中保留偏移变量的问题。

View(dredge(full, 
            beta = "sd"), #note sd was not in quotes above!
            fixed = "cond(offset(log(time)))") #retains offset in all models

然而，这又引发了一个新问题：

Warning message:
In dredge(full, beta = "sd", fixed = "cond(offset(log(time)))") :
  do not know how to standardize coefficients of 'glmmTMB', argument 'beta' ignored

@Kamil Barton 在下面建议我需要使用罢工和 stdizeFit 来解决此警告消息。但到目前为止，如果没有其他错误消息，我无法让此代码正常工作。此外，我担心标准化我的响应变量（如下所示）是错误的方法，并且由于标准化时间变量会导致负值，因此无法再将其记录在模型中（产生 NA 值）。

无论如何，这是我尝试使用与帮助文件中几乎相同的方法来实施 Kamil 建议的方法：

dfSTD <- stdize(select(df, counts, A, B, C, YN, T, ID),
                center = TRUE, scale = TRUE) 

full <- stdizeFit(glmmTMB(z.counts ~ z.A + z.B + z.C + z.YN + 
                              offset(log(z.T)) + (1|ID),
                          zi = ~ (1|ID),
                          family = poisson(),
                          data = dfSTD,
                          na.action = "na.fail"),
                   which = "formula", evaluate = FALSE)

Error in stdizeFit(glmmTMB(z.counts ~ z.A + z.B + z.C + z.YN + : 
  argument "data" is missing, with no default

任何关于解决这些最新错误的想法，无论是卡米尔建议的方法还是我上面所做的，都将不胜感激。

编辑 2020/07/10

我已经能够使用以下代码块解决上述问题：

dfSTD <- stdize(df, omit.cols = c("counts", "YN", "T", "ID"), 
                center = TRUE, scale = TRUE) 

full <- stdizeFit(glmmTMB(counts ~ z.A + z.B + z.C + YN + 
                                  offset(log(T)) + (1|ID),
                          zi = ~ (1|ID),
                          family = poisson(),
                          data = dfSTD,
                          na.action = "na.fail"),
                  data = dfSTD, which = "formula", evaluate = FALSE)

但是，现在当我使用疏通功能时，我又遇到了另一个错误。

allEVCounts <- dredge(full, 
                      beta = "sd", 
                      fixed = "cond(offset(log(T)))")

Error in nobs.default(global.model) : no 'nobs' method is available

在确定如何以及在何处解决“nobs”方法的任何帮助将不胜感激。

score 1 · Accepted Answer

dredge当偏移项包含在公式中时，它以与固定模型项相同的方式处理偏移项。要在所有模型中保留偏移量，请添加一个参数fixed = "<offset term name>"，在本例中为offset(log(T))：

dredge(full, beta = "sd", fixed = "cond(offset(log(T)))")

请注意，在您的代码中beta = sd（不带引号）被视为beta = "none"因为您传递了一个函数 sd而不是字符串"sd"，并且这不是一个可接受的值。

有关“NA/NaN 函数评估”的警告来自glmmTMB，某些子模型可能不会与偏移量收敛。设置options(warn = 1)（立即警告消息）并使用dredgewithtrace = TRUE找出哪些消息。

score 0 · Accepted Answer

来自Kamil Bartoń的解决方案是： in stdizeFit, set evaluate=TRUE，否则它会产生调用而不是拟合模型对象，这不是您（和dredge）所期望的。

r - 包括泊松模型中的偏移量是候选模型的两倍

2 回答 2

Related

Reference