首先,我不是统计学家,所以如果有一些我看不到的非常明显的东西,我深表歉意。如果您能指出我正确的方向,我将不胜感激!
我想计算我的多元逻辑模型的拟合度——它正在查看与某些环境变量(相位、风向和海况 = 分类、云 = 连续)和调查持续时间相关的二进制存在/不存在数据 (Sp3):
> Model.pres <- glm(Sp3~Phase+Duration+WindDir+Seastate+Clouds, data=Pres_suit, family=binomial)
> summary(Model.pres)
Call:
glm(formula = Sp3 ~ Phase + Duration + WindDir + Seastate + Clouds,
family = binomial, data = Pres_suit)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8884 -0.3752 -0.2772 -0.1987 3.0564
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.5372280 0.3762506 -6.743 1.55e-11 ***
Phase2 0.3922746 0.2375475 1.651 0.098667 .
Phase3 0.3558603 0.2560635 1.390 0.164609
Phase4 0.0805965 0.2381698 0.338 0.735062
Phase5 0.5069505 0.2144266 2.364 0.018068 *
Phase6 0.4925329 0.2212965 2.226 0.026036 *
Phase7 0.2931561 0.2525538 1.161 0.245737
Phase8 -0.3385487 0.2750551 -1.231 0.218383
Duration 0.0311791 0.0089617 3.479 0.000503 ***
WindDirN -0.4908414 0.2650803 -1.852 0.064073 .
WindDirNE -0.6806600 0.3229646 -2.108 0.035071 *
WindDirNW -0.7477190 0.3452913 -2.165 0.030351 *
WindDirS -0.1167498 0.2871772 -0.407 0.684344
WindDirSE -0.3359101 0.2194478 -1.531 0.125842
WindDirSW -0.3228489 0.2366893 -1.364 0.172561
WindDirVA -0.0589474 0.1961462 -0.301 0.763774
WindDirW -0.5573541 0.2216143 -2.515 0.011904 *
Seastate1 -0.3681850 0.2218246 -1.660 0.096954 .
Seastate2 -1.2598255 0.2363087 -5.331 9.75e-08 ***
Seastate3 -2.1172848 0.3083132 -6.867 6.54e-12 ***
Clouds -0.0009123 0.0018059 -0.505 0.613430
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2434.6 on 5783 degrees of freedom
Residual deviance: 2256.7 on 5763 degrees of freedom
AIC: 2298.7
Number of Fisher Scoring iterations: 6
> anoGLM<-anova(Model.pres, test="Chisq")
> anoGLM
Analysis of Deviance Table
Model: binomial, link: logit
Response: Sp3
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 5783 2434.6
Phase 7 19.877 5776 2414.7 0.0058417 **
Duration 1 14.558 5775 2400.1 0.0001359 ***
WindDir 8 39.892 5767 2360.2 3.355e-06 ***
Seastate 3 103.248 5764 2257.0 < 2.2e-16 ***
Clouds 1 0.255 5763 2256.7 0.6135572
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
为了表明模型与数据的拟合程度,我决定检查伪 r2 (Nagelkerke) 和 Hosmer-Lemeshow(因为调整后的 r2 不是正确的选择):
> #Hosmer-Lemeshow Goodness of Fit (GOF) Test
> hoslem.test(Pres_suit$Sp3, fitted(Model.pres))
Hosmer and Lemeshow goodness of fit (GOF) test
data: Pres_suit$Sp3, fitted(Model.pres)
X-squared = 24.492, df = 8, p-value = 0.001894
> #Pseudo r2
> library(fmsb)
> NagelkerkeR2(Model.pres)
$N
[1] 5784
$R2
[1] 0.08812984
通过低伪 r2 和显着的 HosLem 检验,我使用DHARMa package
> # simulated quantile residuals
> sim.pres <- simulateResiduals(Model.pres)
> sim.pres
Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
Scaled residual values: 0.6273667 0.3875756 0.3128554 0.5701987 0.3334376 0.3622152 0.674742 0.8650855 0.7833462 0.523428 0.05092012 0.1867278 0.005426142 0.7711206 0.5303359 0.8879234 0.2274915 0.7091264 0.2288981 0.7708964 ...
> plot(sim.pres)
[![plot(sim.pres)][1]][1]
> # Dispersion test
> testDispersion(sim.pres) # tests under and overdispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs. simulated
data: simulationOutput
ratioObsSim = 0.99815, p-value = 0.968
alternative hypothesis: two.sided
> # Outlier test
> testOutliers(sim.pres, type = "binomial")
DHARMa outlier test based on exact binomial test with approximate expectations
data: sim.pres
outliers at both margin(s) = 40, observations = 5784, p-value = 0.4155
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.004945089 0.009405336
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0.006915629
[![Dispersion and Outlier test plots][1]][1]
> # testing zero inflation
> testZeroInflation(sim.pres) # tests if there are more zeros than expected
DHARMa zero-inflation test via comparison to expected zeros with simulation under H0 =
fitted model
data: simulationOutput
ratioObsSim = 1, p-value = 0.968
alternative hypothesis: two.sided
[![Zero inflation plot][1]][1]
我只是找不到重要的 HL 测试的原因,有人有什么想法吗?我也在检查其他物种,并且有一些甚至更多的零,但它们没有产生显着的 HL(尽管 Nagelkerke r2 甚至更低)。还有什么我需要检查的吗?