r - 为什么我会收到带有 glm 的“算法不收敛”和“数字拟合概率为 0 或 1”的警告？

Question

所以这是一个非常简单的问题，只是似乎无法弄清楚。

我正在使用 glm 函数运行 logit，但不断收到与自变量相关的警告消息。它们被存储为因子，我已将它们更改为数字但没有运气。我也将它们编码为 0/1，但这也不起作用。

请帮忙！

> mod2 <- glm(winorlose1 ~ bid1, family="binomial")
Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred

我也在 Zelig 中尝试过，但类似的错误：

> mod2 = zelig(factor(winorlose1) ~ bid1, data=dat, model="logit")
How to cite this model in Zelig:
Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred

编辑：

> str(dat)
'data.frame':   3493 obs. of  3 variables:
 $ winorlose1: int  2 2 2 2 2 2 2 2 2 2 ...
 $ bid1      : int  700 300 700 300 500 300 300 700 300 300 ...
 $ home      : int  1 0 1 0 0 0 0 1 0 0 ...
 - attr(*, "na.action")=Class 'omit'  Named int [1:63021] 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 ...
  .. ..- attr(*, "names")= chr [1:63021] "3494" "3495" "3496" "3497" ...

score 45 · Accepted Answer

如果您查看?glm（甚至在 Google 上搜索您的第二条警告消息），您可能会从文档中偶然发现：

有关二项式 GLM 的“出现数字 0 或 1 的拟合概率”警告消息的背景，请参见 Venables & Ripley (2002, pp. 197-8)。

现在，并不是每个人都有那本书。但是假设我这样做是犹太教，这是相关的段落：

在一种相当普遍的情况下，收敛问题和 Hauck-Donner 现象都可能发生。这是当拟合概率非常接近零或一时。考虑一个有数千个病例和大约 50 个二进制解释变量的医学诊断问题（这可能是由于编码较少的分类变量而产生的）；这些指标之一很少是真的，但总是表明疾病存在。那么具有该指标的案例的拟合概率应该为 1，这只能通过取 β _i = ∞ 来实现。结果来自glm将是警告，估计系数约为 +/- 10。在统计文献中对此进行了相当广泛的讨论，通常声称不存在最大似然估计；参见 Sautner 和 Duffy (1989, p. 234)。

这本书的一位作者在这里进行了更详细的评论。因此，这里的教训是仔细查看您的预测器的一个级别。（和谷歌警告信息！）