r - 如何解决 GLMM 残差模式、空间相关性和零距离误差？

Question

当我进行数据分析时，我遇到了一个严重的问题。

我想研究治疗强度如何影响响应变量 y。实地考察是在很久以前的两年前在不同的SITES进行的。据我所知，group_bySITES并YEAR获得 y 的均值或总和值并不是很好，尤其是不包括 sum，因为在不同的SITES.

然后我尝试glmmTMB使用包构建负二项式模型并检查DHARMa：我的残差模式不太好。

library(glmmTMB)
model <- glmmTMB::glmmTMB(y~intensity+YEAR+(1|SITES),data = f,family = 'nbinom2')

res = simulateResiduals(model)
res %>% plot()

我试图检查它是否也违反了空间自相关的假设DHARMa。它表明严重违反假设（p 值 = 0.0083）。所以，我试图解决这个问题。

res2 <- recalculateResiduals(res, group = f$SITES)
testSpatialAutocorrelation(res2, 
                           x = f %>% group_by(SITES) %>% summarise(lat=mean(lat)) %>%  pull(lat),
                           y = f %>% group_by(SITES) %>% summarise(long=mean(long)) %>%  pull(long)
)



    DHARMa Moran's I test for distance-based autocorrelation

data:  res2
observed = 0.1135, expected = -0.0714, sd = 0.0701, p-value = 0.0083
alternative hypothesis: Distance-based autocorrelation

因为通过glmmTMB允许将坐标添加到方差结构中创建的模型。因此，我尝试使用以下 3 种方法将坐标添加到上述模型：

mat()
gau()
exp()

temp$pos <- numFactor(f$long,f$lat)  # coordinate
# mat
FormA1 = formula(y ~ intensity + YEAR + mat(pos + 0 | SITES))
FormA2 = formula(y ~ intensity + YEAR + (1|SITES) + mat(pos + 0 | SITES))
# gau
FormB1 = formula(y ~ intensity + YEAR + gau(pos + 0 | SITES))
FormA2 = formula(y ~ intensity + YEAR + (1|SITES) + gau(pos + 0 | SITES))
# exp
FormA1 = formula(y ~ intensity + YEAR + exp(pos + 0 | SITES))
FormB2 = formula(y ~ intensity + YEAR + (1|SITES) + exp(pos + 0 | SITES))





model <- glmmTMB::glmmTMB(FormA1,  #FormA,FormB,FormC I tried one by one.
                            data = f,
                            family = 'nbinom2')

但结果是只有exp()方差结构可以运行。gau()和的方差结构mat()会告诉警告：

Warning in fitTMB(TMBStruc) :
  Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
Warning in fitTMB(TMBStruc) :
  Model convergence problem; false convergence (8). See vignette('troubleshooting')

即使exp()可以运行，但当我尝试检查空间相关性时，也严重违反了假设（p 值 = 0.0087）。

所以，我放弃了将坐标添加到方差结构中的方法glmmTMB，转而使用glmmPQL. 因为，它可以给出correlation variance structureforMASS::glmmPQL和类似的gls()和lme()在nlme包中。

相关结构为：

corSpher
corSpatial
corGau
corExp
corRatio
corLin

但是，当我尝试运行如下代码时：

model <- MASS::glmmPQL(y ~ intensity, random = ~ 1|SITES,family = negative.binomial(theta = 0.9),
    correlation = corExp(form = ~  long+lat|SITES),  #corExp,corGau, corSpher, corLin ,corRatio, all I tried
    data = f)

但他们总是显示错误：Error in getCovariate.corSpatial(object, data = data):cannot have zero distances in "corSpatial".

考虑到我的实地调查情况，必须有人考虑对我的数据进行分组并获得y的总和或平均值，但问题是：

由于组内不同的观察结果，组和YEARySITES的总和必须产生偏差误差。
当 group by 获得 y 的平均值时可能是合适的，但我不想缩小观察的数量。.

我在和其他网站上寻找了很多stackoverflow 关于如何解决问题的帖子：cannot have zero distances in "corSpatial". 但是我没有得到很好的分辨率，包括有人说：障碍是数据中的重复坐标。当然，他说的对，但我已经把lon + lat|SITES相关结构，它也显示cannot have zero distances in "corSpatial"。所以，我意识到我必须从互联网上寻求帮助。

所以，我想说的问题是：根据我的数据，如何创建一个不违反残差模式和空间自相关的好的混合线性模型。

当然，你可以使用其他模型，但最好不要使用贝叶斯，因为这对我来说真的很奇怪。

如果您真的想帮助我，我希望您可以发布您的代码，以便我可以再次运行它，而不是在这里输入一些句子。我已经上传了我的部分原始数据，以便为您提供支持。我真的很想得到互联网上所有朋友的帮助。

在我的时区午夜 3 点提前致谢。

这是我的数据：

f  = 
structure(list(YEAR = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2000", 
"2001"), class = "factor"), SITES = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 11L, 11L, 11L, 14L, 14L, 14L, 14L, 14L, 14L, 
14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 
15L, 15L, 15L, 15L, 15L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
13L, 13L, 13L), .Label = c("A1", "A10", "A11", "A12", "A17", 
"A18", "A20", "A3", "A8", "A9", "B1", "B14", "B16", "B3", "B6"
), class = "factor"), long = c(80.41081194, 80.41081194, 80.41081194, 
80.41081194, 80.41081194, 80.41081194, 80.41081194, 80.41081194, 
80.41081194, 80.48462682, 80.48462682, 80.48462682, 80.48462682, 
80.48462682, 80.48462682, 80.48462682, 80.48462682, 80.48462682, 
80.57384333, 80.57384333, 80.57384333, 80.57384333, 80.57384333, 
80.57384333, 80.57384333, 80.57384333, 80.57384333, 80.48125, 
80.48125, 80.48125, 80.48125, 80.48125, 80.48125, 80.48125, 80.48125, 
80.48125, 80.56281944, 80.56281944, 80.56281944, 80.56281944, 
80.56281944, 80.56281944, 80.56281944, 80.56281944, 80.56281944, 
80.46875, 80.46875, 80.46875, 80.46875, 80.46875, 80.46875, 80.46875, 
80.46875, 80.46875, 80.41125278, 80.41125278, 80.41125278, 80.41125278, 
80.41125278, 80.41125278, 80.41125278, 80.41125278, 80.41125278, 
80.30014935, 80.30014935, 80.30014935, 80.30014935, 80.30014935, 
80.30014935, 80.30014935, 80.30014935, 80.30014935, 80.26167842, 
80.26167842, 80.26167842, 80.26167842, 80.26167842, 80.26167842, 
80.26167842, 80.26167842, 80.26167842, 80.20558138, 80.20558138, 
80.20558138, 80.20558138, 80.20558138, 80.20558138, 80.20558138, 
80.20558138, 80.20558138, 80.349776, 80.349776, 80.349776, 80.349776, 
80.349776, 80.349776, 80.349776, 80.349776, 80.349776, 80.349776, 
80.349776, 80.349776, 80.373821, 80.373821, 80.373821, 80.373821, 
80.373821, 80.373821, 80.373821, 80.373821, 80.373821, 80.373821, 
80.373821, 80.373821, 80.434011, 80.434011, 80.434011, 80.434011, 
80.434011, 80.434011, 80.434011, 80.434011, 80.434011, 80.434011, 
80.434011, 80.434011, 80.520748, 80.520748, 80.520748, 80.520748, 
80.520748, 80.520748, 80.520748, 80.520748, 80.520748, 80.520748, 
80.520748, 80.520748, 80.507795, 80.507795, 80.507795, 80.507795, 
80.507795, 80.507795, 80.507795, 80.507795, 80.507795, 80.507795, 
80.507795, 80.507795), lat = c(41.17633167, 41.17633167, 41.17633167, 
41.17633167, 41.17633167, 41.17633167, 41.17633167, 41.17633167, 
41.17633167, 41.12161667, 41.12161667, 41.12161667, 41.12161667, 
41.12161667, 41.12161667, 41.12161667, 41.12161667, 41.12161667, 
40.97858194, 40.97858194, 40.97858194, 40.97858194, 40.97858194, 
40.97858194, 40.97858194, 40.97858194, 40.97858194, 41.03570917, 
41.03570917, 41.03570917, 41.03570917, 41.03570917, 41.03570917, 
41.03570917, 41.03570917, 41.03570917, 40.80911056, 40.80911056, 
40.80911056, 40.80911056, 40.80911056, 40.80911056, 40.80911056, 
40.80911056, 40.80911056, 40.80239444, 40.80239444, 40.80239444, 
40.80239444, 40.80239444, 40.80239444, 40.80239444, 40.80239444, 
40.80239444, 40.82870833, 40.82870833, 40.82870833, 40.82870833, 
40.82870833, 40.82870833, 40.82870833, 40.82870833, 40.82870833, 
41.02665248, 41.02665248, 41.02665248, 41.02665248, 41.02665248, 
41.02665248, 41.02665248, 41.02665248, 41.02665248, 40.99086151, 
40.99086151, 40.99086151, 40.99086151, 40.99086151, 40.99086151, 
40.99086151, 40.99086151, 40.99086151, 41.00486149, 41.00486149, 
41.00486149, 41.00486149, 41.00486149, 41.00486149, 41.00486149, 
41.00486149, 41.00486149, 41.021529, 41.021529, 41.021529, 41.021529, 
41.021529, 41.021529, 41.021529, 41.021529, 41.021529, 41.021529, 
41.021529, 41.021529, 40.970129, 40.970129, 40.970129, 40.970129, 
40.970129, 40.970129, 40.970129, 40.970129, 40.970129, 40.970129, 
40.970129, 40.970129, 40.85327, 40.85327, 40.85327, 40.85327, 
40.85327, 40.85327, 40.85327, 40.85327, 40.85327, 40.85327, 40.85327, 
40.85327, 41.045005, 41.045005, 41.045005, 41.045005, 41.045005, 
41.045005, 41.045005, 41.045005, 41.045005, 41.045005, 41.045005, 
41.045005, 41.083288, 41.083288, 41.083288, 41.083288, 41.083288, 
41.083288, 41.083288, 41.083288, 41.083288, 41.083288, 41.083288, 
41.083288), y = c(36L, 39L, 45L, 167L, 150L, 113L, 74L, 80L, 
72L, 197L, 249L, 212L, 195L, 173L, 105L, 43L, 44L, 27L, 80L, 
69L, 52L, 306L, 308L, 188L, 123L, 106L, 121L, 27L, 32L, 26L, 
197L, 232L, 245L, 114L, 110L, 170L, 12L, 46L, 57L, 254L, 332L, 
207L, 33L, 95L, 89L, 40L, 81L, 77L, 119L, 122L, 194L, 81L, 54L, 
71L, 34L, 29L, 33L, 136L, 129L, 189L, 42L, 38L, 69L, 53L, 39L, 
55L, 62L, 33L, 49L, 26L, 15L, 31L, 130L, 144L, 79L, 76L, 41L, 
86L, 21L, 25L, 16L, 104L, 115L, 109L, 69L, 98L, 84L, 26L, 19L, 
30L, 10L, 12L, 4L, 13L, 9L, 4L, 27L, 0L, 4L, 6L, 2L, 6L, 3L, 
1L, 0L, 1L, 1L, 1L, 6L, 3L, 5L, 0L, 5L, 2L, 0L, 1L, 2L, 2L, 0L, 
2L, 1L, 4L, 1L, 10L, 6L, 9L, 2L, 2L, 2L, 0L, 5L, 0L, 17L, 7L, 
20L, 33L, 34L, 45L, 1L, 2L, 0L, 2L, 2L, 2L, 5L, 6L, 3L, 12L, 
22L, 12L), intensity = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("low", 
"high"), class = "factor")), class = "data.frame", row.names = c(NA, 
-150L))

score 1 · Accepted Answer

tl; dr空间模式可以通过空间趋势（xy 平面）很好地解释，您不需要自相关模型。（如果你这样做了，你也可以尝试将此模型拟合到mgcv.）

套餐：

library(glmmTMB)
library(DHARMa)
library(ggplot2); theme_set(theme_bw())
library(tidyverse)
library(colorspace)

我绘制了数据（这里没有复制，它没有显示任何惊人的东西）

ggplot(f, aes(intensity, y, shape=YEAR, fill = SITES)) +
  geom_boxplot() + facet_wrap(~YEAR)

有 15 个站点，因此只有 15 个独特的纬度/经度位置。让我们在空间中按站点绘制残差的平均值（还有其他方法可以做到这一点，但augment很方便）

a <- broom.mixed::augment(model, data = f)
aa <- (a
  %>% group_by(long, lat)
  %>% summarize(across(.resid, mean), .groups = "drop")
)
ggplot(aa, aes(long, lat, colour = .resid),
              size = abs(.resid))) +
  geom_point() +
  scale_color_continuous_diverging() +
  scale_size(range = c(3, 8))

绘图sign(.resid)反而使模式更加清晰：

几乎所有的负残差都在 SW 中，几乎所有的正残差都在 NE 中。

解决方案：将lat和long作为预测变量添加到模型中。

model2 <- update(model, . ~. + lat + long)

res2 <- simulateResiduals(model2)
res3 <- recalculateResiduals(res2, group = f$SITES)
locs <- f %>% group_by(SITES) %>% summarise(across(c(lat, long), mean))
testSpatialAutocorrelation(res3, x = locs$long, y = locs$lat)

这会处理显着的自相关，并使整体诊断看起来更好。

r - 如何解决 GLMM 残差模式、空间相关性和零距离误差？

1 回答 1

Related

Reference