我对混合效果和加法模型都很陌生,所以如果这里的答案很简单,我很抱歉。
我收集了关于几种代谢化学物质(M1、M2...)、协变量(时间、种族、性别...)和疾病状态(D、D.binary)的数据。我正在尝试根据从 GEE 变量选择中选择的变量生成 GAMM。
数据:
- 8 例,51 例匹配对照
- 每个受试者大约 10 个时间点
- ~630 次观察
- M1,M2...M3 是代谢物,其中许多是由共同部分形成的,代谢物水平相关,因为它们竞争相同的组成部分
- 协变量将受试者分层为亚组
这是我现在的模型:
> b = gamm(D.binary ~ Time + s(M1) ,
random = list(ParticipantID = ~ 1 + Time), niterPQL=50,
data = NEC_data, family=binomial(link="logit"))
Maximum number of PQL iterations: 50
iteration 1
iteration 2
...
iteration 49
iteration 50
Warning message:
In gammPQL(y ~ X - 1, random = rand, data = strip.offset(mf), family = family, :
gamm not converged, try increasing niterPQL
> plot(b$gam,pages=1)
> summary(b$lme) # details of underlying lme fit
Linear mixed-effects model fit by maximum likelihood
Data: data
AIC BIC logLik
-160 -124 88
Random effects:
Formula: ~Xr - 1 | g
Structure: pdIdnot
Xr1 Xr2 Xr3 Xr4 Xr5 Xr6 Xr7 Xr8
StdDev: 0.812 0.812 0.812 0.812 0.812 0.812 0.812 0.812
Formula: ~1 + Time | ParticipantID %in% g
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 5.68324 (Intr)
Time 0.50739 -0.92
Residual 0.00691
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: list(fixed)
Value Std.Error DF t-value p-value
X(Intercept) -2.81 0.729 573 -3.86 0.0001
XTime -0.15 0.065 573 -2.30 0.0220
Xs(M1)Fx1 -1.60 0.066 573 -24.29 0.0000
Correlation:
X(Int) XTime
XTime -0.920
Xs(M1)Fx1 0.004 0.000
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.3472 -0.0692 -0.0117 0.0305 20.7271
Number of Observations: 636
Number of Groups:
g ParticipantID %in% g
1 61
> summary(b$gam) # gam style summary of fitted model
Family: binomial
Link function: logit
Formula:
NEC.binary ~ Time + s(M1)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.8135 0.7289 -3.86 0.00013 ***
Time -0.1495 0.0651 -2.30 0.02188 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(M1) 4.1 4.1 14913 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.0872
Scale est. = 4.7815e-05 n = 636
> anova(b$gam)
Family: binomial
Link function: logit
Formula:
NEC.binary ~ Time + s(M1)
Parametric Terms:
df F p-value
Time 1 5.28 0.022
Approximate significance of smooth terms:
edf Ref.df F p-value
s(M1) 4.1 4.1 14913 <2e-16
> gam.check(b$gam)
我怀疑我可能搞砸了一些相当基本的东西,因为 M1 是疾病状态最明显的鉴别器。它很重要(应该如此),但相关性非常低。此外,很明显,模型没有收敛(即使我将迭代次数从 20 增加到 50)。最后检查图看起来非常离谱
问题
我犯了基本的语法错误吗?我正在查看的模型中是否有一些恶意组件?任何帮助将不胜感激。
进一步的工作
我想在模型中添加另一个代谢物 (M2) 和 2 个协变量(出生体重和种族)。当我将 M2 添加到模型中时,我得到一个非收敛错误:
> b = gamm(D.binary ~ Time + s(M1) + s(M2) ,
random = list(ParticipantID = ~ 1 + Time), niterPQL=20, correlation = corLin(),
data = NEC_data, family=binomial(link="logit"))
Maximum number of PQL iterations: 20
iteration 1
iteration 2
Error in lme.formula(fixed = fixed, random = random, data = data, correlation = correlation, :
nlminb problem, convergence error code = 1
message = false convergence (8)
任何有关进入多维空间的建议也将不胜感激。
添加
我还使用离散疾病分类(对照:0,1 疾病:2,3)和泊松噪声尝试了这个模型。
> b = gamm(NEC ~ DPP + s(DSLNT_ug.mL) ,
+ random = list(ParticipantID = ~ 1 + DPP), niterPQL=20,
+ data = NEC_data, family=poisson)
Maximum number of PQL iterations: 20
iteration 1
iteration 2
...
iteration 19
iteration 20
Error in solve.default(pdMatrix(a, factor = TRUE)) :
system is computationally singular: reciprocal condition number = 3.13906e-19