r - 使用多个因子预测器从 GLM 中删除截距

Question

我在 R 中使用 logit 链接函数运行二项式逻辑回归。我的响应是阶乘 [0/1]，我有两个多级阶乘预测变量 - 我们称它们为 a 和 b，其中 a 有 4 个因子水平（a1、a2、a3 ,a4) 和 b 有 9 个因子水平 (b1,b2...b9)。所以：

mod <- glm(y~a+b, family=binomial(logit),data=pretend) summary(mod)

然后模型输出将显示有关模型的所有信息以及系数。

汇总输出中缺少 a 和 b（a1 和 b1）的因子水平。我知道它在模型的“拦截”中是固定的。我已经读过，如果我想删除截距项并查看这些因子水平的估计值，我可以在模型公式中添加 -1 或 +0，即：

mod2 <- glm(y~a+b-1, family=binomial(logit),data=pretend)

...或... mod2 <- glm(y~a+b+0, family=binomial(logit),data=pretend) 摘要(mod2)

在新模型 (mod2) 中，截距项消失了，变量 a 的因子水平 a1 在系数列表中给出。但是，变量 b 的因子水平 b1 仍然缺失，并且鉴于不再有截距项，那么我该如何解释该因子水平的优势比呢？

有人可以向我解释如何获得 b1 的系数以及为什么会这样吗？

谢谢你。

score 0 · Accepted Answer

You can try adjusting the contrasts. My favorites are

options(contrasts = c('contr.sum','contr.poly'))

Here the assumption is that the sum of the a_i's = 0 and the sum of the b_i's = 0 (though it just occurred to me that this may not be the case for GLM) With those contrasts, it usually leaves off the last a and b because they can be recovered by taking the opposite of the sum of the other a's or b's respectively (since they all sum to 0.)

check this question out or more reference. https://stats.stackexchange.com/questions/162381/how-to-fit-a-glm-with-sum-to-zero-constraints-in-r-no-reference-level

score 0 · Accepted Answer

为什么要删除截距项并获得的系数a1？

具有因子变量的逻辑回归模型以第一个因子水平作为参考进行拟合。然后将该因子水平的对数几率（系数）设置为 1.0。

在比较因子（或组）之间的对数几率时，所得因子水平的所有对数几率均指基数。因此，您可以计算不同组之间的优势比，并预测事件发生的可能性或多或少（与基本因子水平相比）。

a如果不再有参考水平，我不知道什么可以作为任何水平的参考a。如果参考a是b1then，你如何解释这个？是否有任何参考表明删除拦截是有意义的？（真的很好奇，还没有听说过这种方法）

顺便说一句，您不需要截距来计算因子水平之间的优势比。这是一个计算随机二项式优势比的小例子glm：

library(oddsratio)
fit.glm <- glm(admit ~ gre + gpa + rank, data = data.glm, family = "binomial") # fit model

# Calculate OR for specific increment step of continuous variable
calc.oddsratio.glm(data = data.glm, model = fit.glm, incr = list(gre = 380, gpa = 5))

predictor oddsratio CI.low (2.5 %) CI.high (97.5 %)          increment
1     gre     2.364          1.054            5.396                380
2     gpa    55.712          2.229         1511.282                  5
3   rank2     0.509          0.272            0.945 Indicator variable
4   rank3     0.262          0.132            0.512 Indicator variable
5   rank4     0.212          0.091            0.471 Indicator variable

score 0 · Accepted Answer

给出的很有趣a1。人们会期望一个因子水平作为“参考”，因此在输出中没有任何 OR（因为它是 1.0）。

我认为b1是您的参考，因此是隐藏的，因此是 1.0。

r - 使用多个因子预测器从 GLM 中删除截距

3 回答 3

Related

Reference