r - 从 glm 中提取 pvalue

Question

我正在运行许多回归，并且只对一个特定变量的系数和 p 值的影响感兴趣。因此，在我的脚本中，我希望能够从 glm 摘要中提取 p 值（获取系数本身很容易）。我知道查看 p 值的唯一方法是使用 summary(myReg)。还有其他方法吗？

例如：

fit <- glm(y ~ x1 + x2, myData)
x1Coeff <- fit$coefficients[2] # only returns coefficient, of course
x1pValue <- ???

我尝试将fit$coefficients其视为矩阵，但仍然无法简单地提取 p 值。

是否有可能做到这一点？

谢谢！

score 66 · Accepted Answer

你要

coef(summary(fit))[,4]

它从显示的表格输出中提取p值的列向量summary(fit)。在您运行模型拟合之前，实际上不会计算p值。summary()

顺便说一句，如果可以的话，请使用提取器函数而不是深入研究对象：

fit$coefficients[2]

应该

coef(fit)[2]

如果没有提取器功能，str()是你的朋友。它允许您查看任何对象的结构，从而可以查看对象包含的内容以及如何提取它：

summ <- summary(fit)

> str(summ, max = 1)
List of 17
 $ call          : language glm(formula = counts ~ outcome + treatment, family = poisson())
 $ terms         :Classes 'terms', 'formula' length 3 counts ~ outcome + treatment
  .. ..- attr(*, "variables")= language list(counts, outcome, treatment)
  .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. ..- attr(*, "term.labels")= chr [1:2] "outcome" "treatment"
  .. ..- attr(*, "order")= int [1:2] 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(counts, outcome, treatment)
  .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "factor" "factor"
  .. .. ..- attr(*, "names")= chr [1:3] "counts" "outcome" "treatment"
 $ family        :List of 12
  ..- attr(*, "class")= chr "family"
 $ deviance      : num 5.13
 $ aic           : num 56.8
 $ contrasts     :List of 2
 $ df.residual   : int 4
 $ null.deviance : num 10.6
 $ df.null       : int 8
 $ iter          : int 4
 $ deviance.resid: Named num [1:9] -0.671 0.963 -0.17 -0.22 -0.956 ...
  ..- attr(*, "names")= chr [1:9] "1" "2" "3" "4" ...
 $ coefficients  : num [1:5, 1:4] 3.04 -4.54e-01 -2.93e-01 1.34e-15 1.42e-15 ...
  ..- attr(*, "dimnames")=List of 2
 $ aliased       : Named logi [1:5] FALSE FALSE FALSE FALSE FALSE
  ..- attr(*, "names")= chr [1:5] "(Intercept)" "outcome2" "outcome3" "treatment2" ...
 $ dispersion    : num 1
 $ df            : int [1:3] 5 4 5
 $ cov.unscaled  : num [1:5, 1:5] 0.0292 -0.0159 -0.0159 -0.02 -0.02 ...
  ..- attr(*, "dimnames")=List of 2
 $ cov.scaled    : num [1:5, 1:5] 0.0292 -0.0159 -0.0159 -0.02 -0.02 ...
  ..- attr(*, "dimnames")=List of 2
 - attr(*, "class")= chr "summary.glm"

因此，我们注意到coefficients我们可以使用提取的组件coef()，但其他组件没有提取器，例如null.deviance，您可以将其提取为summ$null.deviance。

score 7 · Accepted Answer

您可以直接输入名称而不是数字

coef(summary(fit))[,'Pr(>|z|)']

系数摘要中的其他可用：

Estimate Std. Error z value Pr(>|z|)

score 4 · Accepted Answer

我过去曾使用此技术summary从拟合模型对象中提取预测数据：

coef(summary(m))[grepl("var_i_want$",row.names(coef(summary(m)))), 4]

这让我可以轻松地编辑我想要获取数据的变量。

或者正如@Ben所指出的那样，使用matchor %in%，比grepl：

coef(summary(m))[row.names(coef(summary(m))) %in% "var_i_want" , 4]

score 2 · Accepted Answer

broom包中的tidy功能（ Tidyverse的一部分，可在 CRAN 上获得）提供了一种将 GLM 摘要转换为数据框的快速简便的方法，这在您上面描述的情况之外的许多情况下可能很有用。

在这种情况下，您可以使用以下代码获得所需的输出：

x1pValue <- broom::tidy(fit)$p.value[2]

score 1 · Accepted Answer

好吧，这将是另一种方式，但不是执行它的最有效方式：

a = coeftable(model).cols[4]
pVals = [ a[i].v for i in 1:length(a) ]

这可确保从 glm 中提取的值不在 StatsBase 中。其中，您可以根据自己的要求使用 pVals。希望它有帮助，埃比

r - 从 glm 中提取 pvalue

5 回答 5

Related

Reference