1

当我进行数据分析时,我遇到了一个严重的问题。

我想研究治疗强度如何影响响​​应变量 y。实地考察是在很久以前的两年前在不同的SITES进行的。据我所知,group_bySITESYEAR获得 y 的均值或总和值并不是很好,尤其是不包括 sum,因为在不同的SITES.

然后我尝试glmmTMB使用包构建负二项式模型并检查DHARMa:我的残差模式不太好。

library(glmmTMB)
model <- glmmTMB::glmmTMB(y~intensity+YEAR+(1|SITES),data = f,family = 'nbinom2')

res = simulateResiduals(model)
res %>% plot() 

我试图检查它是否也违反了空间自相关的假设DHARMa。它表明严重违反假设(p 值 = 0.0083)。所以,我试图解决这个问题。

res2 <- recalculateResiduals(res, group = f$SITES)
testSpatialAutocorrelation(res2, 
                           x = f %>% group_by(SITES) %>% summarise(lat=mean(lat)) %>%  pull(lat),
                           y = f %>% group_by(SITES) %>% summarise(long=mean(long)) %>%  pull(long)
)



    DHARMa Moran's I test for distance-based autocorrelation

data:  res2
observed = 0.1135, expected = -0.0714, sd = 0.0701, p-value = 0.0083
alternative hypothesis: Distance-based autocorrelation

因为通过glmmTMB允许将坐标添加到方差结构中创建的模型。因此,我尝试使用以下 3 种方法将坐标添加到上述模型:

  • mat()
  • gau()
  • exp()
temp$pos <- numFactor(f$long,f$lat)  # coordinate
# mat
FormA1 = formula(y ~ intensity + YEAR + mat(pos + 0 | SITES))
FormA2 = formula(y ~ intensity + YEAR + (1|SITES) + mat(pos + 0 | SITES))
# gau
FormB1 = formula(y ~ intensity + YEAR + gau(pos + 0 | SITES))
FormA2 = formula(y ~ intensity + YEAR + (1|SITES) + gau(pos + 0 | SITES))
# exp
FormA1 = formula(y ~ intensity + YEAR + exp(pos + 0 | SITES))
FormB2 = formula(y ~ intensity + YEAR + (1|SITES) + exp(pos + 0 | SITES))





model <- glmmTMB::glmmTMB(FormA1,  #FormA,FormB,FormC I tried one by one.
                            data = f,
                            family = 'nbinom2')


但结果是只有exp()方差结构可以运行。gau()和的方差结构mat()会告诉警告:

Warning in fitTMB(TMBStruc) :
  Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
Warning in fitTMB(TMBStruc) :
  Model convergence problem; false convergence (8). See vignette('troubleshooting')

即使exp()可以运行,但当我尝试检查空间相关性时,也严重违反了假设(p 值 = 0.0087)。

所以,我放弃了将坐标添加到方差结构中的方法glmmTMB,转而使用glmmPQL. 因为,它可以给出correlation variance structureforMASS::glmmPQL和类似的gls()lme()nlme包中。

相关结构为:

  • corSpher
  • corSpatial
  • corGau
  • corExp
  • corRatio
  • corLin

但是,当我尝试运行如下代码时:

model <- MASS::glmmPQL(y ~ intensity, random = ~ 1|SITES,family = negative.binomial(theta = 0.9),
    correlation = corExp(form = ~  long+lat|SITES),  #corExp,corGau, corSpher, corLin ,corRatio, all I tried
    data = f)  

但他们总是显示错误:Error in getCovariate.corSpatial(object, data = data):cannot have zero distances in "corSpatial".

考虑到我的实地调查情况,必须有人考虑对我的数据进行分组并获得y的总和或平均值,但问题是:

  1. 由于组内不同的观察结果,组YEARySITES的总和必须产生偏差误差。
  2. 当 group by 获得 y 的平均值时可能是合适的,但我不想缩小观察的数量。.

我在和其他网站上寻找了很多stackoverflow 关于如何解决问题的帖子:cannot have zero distances in "corSpatial". 但是我没有得到很好的分辨率,包括有人说:障碍是数据中的重复坐标。当然,他说的对,但我已经把lon + lat|SITES相关结构,它也显示cannot have zero distances in "corSpatial"。所以,我意识到我必须从互联网上寻求帮助。

所以,我想说的问题是:根据我的数据,如何创建一个不违反残差模式和空间自相关的好的混合线性模型。

当然,你可以使用其他模型,但最好不要使用贝叶斯,因为这对我来说真的很奇怪。

如果您真的想帮助我,我希望您可以发布您的代码,以便我可以再次运行它,而不是在这里输入一些句子。我已经上传了我的部分原始数据,以便为您提供支持。我真的很想得到互联网上所有朋友的帮助。

在我的时区午夜 3 点提前致谢。

这是我的数据:

f  = 
structure(list(YEAR = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2000", 
"2001"), class = "factor"), SITES = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 11L, 11L, 11L, 14L, 14L, 14L, 14L, 14L, 14L, 
14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 
15L, 15L, 15L, 15L, 15L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
13L, 13L, 13L), .Label = c("A1", "A10", "A11", "A12", "A17", 
"A18", "A20", "A3", "A8", "A9", "B1", "B14", "B16", "B3", "B6"
), class = "factor"), long = c(80.41081194, 80.41081194, 80.41081194, 
80.41081194, 80.41081194, 80.41081194, 80.41081194, 80.41081194, 
80.41081194, 80.48462682, 80.48462682, 80.48462682, 80.48462682, 
80.48462682, 80.48462682, 80.48462682, 80.48462682, 80.48462682, 
80.57384333, 80.57384333, 80.57384333, 80.57384333, 80.57384333, 
80.57384333, 80.57384333, 80.57384333, 80.57384333, 80.48125, 
80.48125, 80.48125, 80.48125, 80.48125, 80.48125, 80.48125, 80.48125, 
80.48125, 80.56281944, 80.56281944, 80.56281944, 80.56281944, 
80.56281944, 80.56281944, 80.56281944, 80.56281944, 80.56281944, 
80.46875, 80.46875, 80.46875, 80.46875, 80.46875, 80.46875, 80.46875, 
80.46875, 80.46875, 80.41125278, 80.41125278, 80.41125278, 80.41125278, 
80.41125278, 80.41125278, 80.41125278, 80.41125278, 80.41125278, 
80.30014935, 80.30014935, 80.30014935, 80.30014935, 80.30014935, 
80.30014935, 80.30014935, 80.30014935, 80.30014935, 80.26167842, 
80.26167842, 80.26167842, 80.26167842, 80.26167842, 80.26167842, 
80.26167842, 80.26167842, 80.26167842, 80.20558138, 80.20558138, 
80.20558138, 80.20558138, 80.20558138, 80.20558138, 80.20558138, 
80.20558138, 80.20558138, 80.349776, 80.349776, 80.349776, 80.349776, 
80.349776, 80.349776, 80.349776, 80.349776, 80.349776, 80.349776, 
80.349776, 80.349776, 80.373821, 80.373821, 80.373821, 80.373821, 
80.373821, 80.373821, 80.373821, 80.373821, 80.373821, 80.373821, 
80.373821, 80.373821, 80.434011, 80.434011, 80.434011, 80.434011, 
80.434011, 80.434011, 80.434011, 80.434011, 80.434011, 80.434011, 
80.434011, 80.434011, 80.520748, 80.520748, 80.520748, 80.520748, 
80.520748, 80.520748, 80.520748, 80.520748, 80.520748, 80.520748, 
80.520748, 80.520748, 80.507795, 80.507795, 80.507795, 80.507795, 
80.507795, 80.507795, 80.507795, 80.507795, 80.507795, 80.507795, 
80.507795, 80.507795), lat = c(41.17633167, 41.17633167, 41.17633167, 
41.17633167, 41.17633167, 41.17633167, 41.17633167, 41.17633167, 
41.17633167, 41.12161667, 41.12161667, 41.12161667, 41.12161667, 
41.12161667, 41.12161667, 41.12161667, 41.12161667, 41.12161667, 
40.97858194, 40.97858194, 40.97858194, 40.97858194, 40.97858194, 
40.97858194, 40.97858194, 40.97858194, 40.97858194, 41.03570917, 
41.03570917, 41.03570917, 41.03570917, 41.03570917, 41.03570917, 
41.03570917, 41.03570917, 41.03570917, 40.80911056, 40.80911056, 
40.80911056, 40.80911056, 40.80911056, 40.80911056, 40.80911056, 
40.80911056, 40.80911056, 40.80239444, 40.80239444, 40.80239444, 
40.80239444, 40.80239444, 40.80239444, 40.80239444, 40.80239444, 
40.80239444, 40.82870833, 40.82870833, 40.82870833, 40.82870833, 
40.82870833, 40.82870833, 40.82870833, 40.82870833, 40.82870833, 
41.02665248, 41.02665248, 41.02665248, 41.02665248, 41.02665248, 
41.02665248, 41.02665248, 41.02665248, 41.02665248, 40.99086151, 
40.99086151, 40.99086151, 40.99086151, 40.99086151, 40.99086151, 
40.99086151, 40.99086151, 40.99086151, 41.00486149, 41.00486149, 
41.00486149, 41.00486149, 41.00486149, 41.00486149, 41.00486149, 
41.00486149, 41.00486149, 41.021529, 41.021529, 41.021529, 41.021529, 
41.021529, 41.021529, 41.021529, 41.021529, 41.021529, 41.021529, 
41.021529, 41.021529, 40.970129, 40.970129, 40.970129, 40.970129, 
40.970129, 40.970129, 40.970129, 40.970129, 40.970129, 40.970129, 
40.970129, 40.970129, 40.85327, 40.85327, 40.85327, 40.85327, 
40.85327, 40.85327, 40.85327, 40.85327, 40.85327, 40.85327, 40.85327, 
40.85327, 41.045005, 41.045005, 41.045005, 41.045005, 41.045005, 
41.045005, 41.045005, 41.045005, 41.045005, 41.045005, 41.045005, 
41.045005, 41.083288, 41.083288, 41.083288, 41.083288, 41.083288, 
41.083288, 41.083288, 41.083288, 41.083288, 41.083288, 41.083288, 
41.083288), y = c(36L, 39L, 45L, 167L, 150L, 113L, 74L, 80L, 
72L, 197L, 249L, 212L, 195L, 173L, 105L, 43L, 44L, 27L, 80L, 
69L, 52L, 306L, 308L, 188L, 123L, 106L, 121L, 27L, 32L, 26L, 
197L, 232L, 245L, 114L, 110L, 170L, 12L, 46L, 57L, 254L, 332L, 
207L, 33L, 95L, 89L, 40L, 81L, 77L, 119L, 122L, 194L, 81L, 54L, 
71L, 34L, 29L, 33L, 136L, 129L, 189L, 42L, 38L, 69L, 53L, 39L, 
55L, 62L, 33L, 49L, 26L, 15L, 31L, 130L, 144L, 79L, 76L, 41L, 
86L, 21L, 25L, 16L, 104L, 115L, 109L, 69L, 98L, 84L, 26L, 19L, 
30L, 10L, 12L, 4L, 13L, 9L, 4L, 27L, 0L, 4L, 6L, 2L, 6L, 3L, 
1L, 0L, 1L, 1L, 1L, 6L, 3L, 5L, 0L, 5L, 2L, 0L, 1L, 2L, 2L, 0L, 
2L, 1L, 4L, 1L, 10L, 6L, 9L, 2L, 2L, 2L, 0L, 5L, 0L, 17L, 7L, 
20L, 33L, 34L, 45L, 1L, 2L, 0L, 2L, 2L, 2L, 5L, 6L, 3L, 12L, 
22L, 12L), intensity = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("low", 
"high"), class = "factor")), class = "data.frame", row.names = c(NA, 
-150L))
4

1 回答 1

1

tl; dr空间模式可以通过空间趋势(xy 平面)很好地解释,您不需要自相关模型。(如果你这样做了,你也可以尝试将此模型拟合到mgcv.)

套餐:

library(glmmTMB)
library(DHARMa)
library(ggplot2); theme_set(theme_bw())
library(tidyverse)
library(colorspace)

我绘制了数据(这里没有复制,它没有显示任何惊人的东西)

ggplot(f, aes(intensity, y, shape=YEAR, fill = SITES)) +
  geom_boxplot() + facet_wrap(~YEAR)

有 15 个站点,因此只有 15 个独特的纬度/经度位置。让我们在空间中按站点绘制残差的平均值(还有其他方法可以做到这一点,但augment很方便)

a <- broom.mixed::augment(model, data = f)
aa <- (a
  %>% group_by(long, lat)
  %>% summarize(across(.resid, mean), .groups = "drop")
)
ggplot(aa, aes(long, lat, colour = .resid),
              size = abs(.resid))) +
  geom_point() +
  scale_color_continuous_diverging() +
  scale_size(range = c(3, 8))

蓝色/红色残差图,显示强烈的 SW/NE 结构

绘图sign(.resid)反而使模式更加清晰:

再次绘制残差图

几乎所有的负残差都在 SW 中,几乎所有的正残差都在 NE 中。

解决方案:将latlong作为预测变量添加到模型中。

model2 <- update(model, . ~. + lat + long)

res2 <- simulateResiduals(model2)
res3 <- recalculateResiduals(res2, group = f$SITES)
locs <- f %>% group_by(SITES) %>% summarise(across(c(lat, long), mean))
testSpatialAutocorrelation(res3, x = locs$long, y = locs$lat)

这会处理显着的自相关,并使整体诊断看起来更好。

于 2022-01-13T01:46:21.330 回答