我试图了解具有有界响应变量的数据集的两种不同拟合方法之间的区别。响应变量是一个分数,因此范围为 [0,1]。我通过谷歌搜索发现有很多不同的方法,因为这是一种常见的操作。我目前对股票 R GLM 拟合和 betareg 包中提供的 Beta 回归之间的差异感兴趣。我使用“betareg”包中的 GasolineYield 数据集作为我的样本数据集。在我发布代码和结果之前,我的两个问题如下:
我是否正确使用内置的 R GLM 在 R 中执行逻辑回归拟合?
为什么 Beta 回归中报告的标准误差比 R 逻辑回归的标准误差小得多?
R 设置代码
library(betareg)
data("GasolineYield", package = "betareg")
“betareg”包中的 Beta 回归代码
gy = betareg(yield ~ batch + temp, data = GasolineYield)
summary(gy)
Beta 回归摘要输出
Call:
betareg(formula = yield ~ batch + temp, data = GasolineYield)
Standardized weighted residuals 2:
Min 1Q Median 3Q Max
-2.8750 -0.8149 0.1601 0.8384 2.0483
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.1595710 0.1823247 -33.784 < 2e-16 ***
batch1 1.7277289 0.1012294 17.067 < 2e-16 ***
batch2 1.3225969 0.1179020 11.218 < 2e-16 ***
batch3 1.5723099 0.1161045 13.542 < 2e-16 ***
batch4 1.0597141 0.1023598 10.353 < 2e-16 ***
batch5 1.1337518 0.1035232 10.952 < 2e-16 ***
batch6 1.0401618 0.1060365 9.809 < 2e-16 ***
batch7 0.5436922 0.1091275 4.982 6.29e-07 ***
batch8 0.4959007 0.1089257 4.553 5.30e-06 ***
batch9 0.3857930 0.1185933 3.253 0.00114 **
temp 0.0109669 0.0004126 26.577 < 2e-16 ***
Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi) 440.3 110.0 4.002 6.29e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 84.8 on 12 Df
Pseudo R-squared: 0.9617
Number of iterations: 51 (BFGS) + 3 (Fisher scoring)
来自股票 R 的 R GLM Logistic 回归代码
glmfit = glm(yield ~ batch + temp, data = GasolineYield, family = "binomial")
summary(glmfit)
R GLM Logistic 回归汇总输出
Call:
glm(formula = yield ~ batch + temp, family = "binomial", data = GasolineYield)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.100459 -0.025272 0.004217 0.032879 0.082113
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.130227 3.831798 -1.600 0.110
batch1 1.720311 2.127205 0.809 0.419
batch2 1.305746 2.481266 0.526 0.599
batch3 1.562343 2.440712 0.640 0.522
batch4 1.048928 2.152385 0.487 0.626
batch5 1.125075 2.176242 0.517 0.605
batch6 1.029601 2.229773 0.462 0.644
batch7 0.540401 2.294474 0.236 0.814
batch8 0.497355 2.288564 0.217 0.828
batch9 0.378315 2.494881 0.152 0.879
temp 0.010906 0.008676 1.257 0.209
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2.34184 on 31 degrees of freedom
Residual deviance: 0.07046 on 21 degrees of freedom
AIC: 36.631
Number of Fisher Scoring iterations: 5