10

如果我在 R 中有一个线性模型的汇总表,我怎样才能获得与交互估计或组截距等相关的 p 值,而无需计算行数?

例如,对于诸如连续和分类的模型lm(y ~ x + group)x对象group的汇总表lm具有以下估计值:

  1. 拦截
  2. x,所有组的斜率
  3. 5 组内与整体截距的差异
  4. 5 组内与整体斜率的差异。

我想找到一种方法将这些中的每一个作为一组 p 值,即使组数或模型公式发生变化。也许有信息汇总表以某种方式用于将行组合在一起?

以下是具有两个不同模型的示例数据集。第一个模型有四组不同的 p 值,我可能想分别获得,而第二个模型只有两组 p 值。

x <- 1:100
groupA <- .5*x + 10 + rnorm(length(x), 0, 1)
groupB <- .5*x + 20 + rnorm(length(x), 0, 1)
groupC <- .5*x + 30 + rnorm(length(x), 0, 1)
groupD <- .5*x + 40 + rnorm(length(x), 0, 1)
groupE <- .5*x + 50 + rnorm(length(x), 0, 1)
groupF <- .5*x + 60 + rnorm(length(x), 0, 1)

myData <- data.frame(x = x,
    y = c(groupA, groupB, groupC, groupD, groupE, groupF),
    group = rep(c("A","B","C","D","E","F"), each = length(x))
)

myMod1 <- lm(y ~ x + group + x:group, data = myData)
myMod2 <- lm(y ~ group + x:group - 1, data = myData)
summary(myMod1)
summary(myMod2)
4

2 回答 2

17

您可以通过 访问所有系数及其相关统计数据summary()$coefficients,如下所示:

> summary(myMod1)$coefficients
                 Estimate  Std. Error      t value      Pr(>|t|)
(Intercept)  9.8598180335 0.207551769  47.50534335 1.882690e-203
x            0.5013049448 0.003568152 140.49427911  0.000000e+00
groupB       9.9833257879 0.293522526  34.01212819 5.343527e-141
groupC      20.0988336744 0.293522526  68.47458673 2.308586e-282
groupD      30.0671851583 0.293522526 102.43569906  0.000000e+00
groupE      39.8366758058 0.293522526 135.71931370  0.000000e+00
groupF      50.4780382104 0.293522526 171.97330259  0.000000e+00
x:groupB    -0.0001115097 0.005046129  -0.02209807  9.823772e-01
x:groupC     0.0004144536 0.005046129   0.08213297  9.345689e-01
x:groupD     0.0022577223 0.005046129   0.44741668  6.547390e-01
x:groupE     0.0024544207 0.005046129   0.48639675  6.268671e-01
x:groupF    -0.0052089956 0.005046129  -1.03227556  3.023674e-01

其中,您只需要 p 值,即第 4 列:

> summary(myMod1)$coefficients[,4]
  (Intercept)             x        groupB        groupC        groupD        groupE        groupF      x:groupB      x:groupC 
1.882690e-203  0.000000e+00 5.343527e-141 2.308586e-282  0.000000e+00  0.000000e+00  0.000000e+00  9.823772e-01  9.345689e-01 
     x:groupD      x:groupE      x:groupF 
 6.547390e-01  6.268671e-01  3.023674e-01 

最后,您只需要特定系数的 p 值,无论是截距还是交互项。一种方法是通过使用作为索引返回的逻辑向量将系数名称 ( names(summary(myMod1)$coefficients[,4])) 与 RegEx匹配:grepl()grepl

> # all group dummies
> summary(myMod1)$coefficients[grepl('^group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
       groupB        groupC        groupD        groupE        groupF 
5.343527e-141 2.308586e-282  0.000000e+00  0.000000e+00  0.000000e+00 
> # all interaction terms
> summary(myMod1)$coefficients[grepl('^x:group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
 x:groupB  x:groupC  x:groupD  x:groupE  x:groupF 
0.9823772 0.9345689 0.6547390 0.6268671 0.3023674 
于 2012-06-27T17:25:29.207 回答
7

现在有broom包来处理统计函数的输出。在这种情况下,使用它的tidy()功能:

library(broom)
tidy(myMod1)

          term      estimate  std.error   statistic       p.value
1  (Intercept) 10.0379389850 0.19497112  51.4842342 5.143448e-220
2            x  0.5009946732 0.00335187 149.4672019  0.000000e+00
3       groupB  9.8949134549 0.27573081  35.8861368 3.002513e-150
4       groupC 19.8437942091 0.27573081  71.9679981 1.021613e-293
5       groupD 29.9055587100 0.27573081 108.4592579  0.000000e+00
6       groupE 39.7258414666 0.27573081 144.0747296  0.000000e+00
7       groupF 50.1210013973 0.27573081 181.7751231  0.000000e+00
8     x:groupB -0.0005319302 0.00474026  -0.1122154  9.106909e-01
9     x:groupC -0.0010145553 0.00474026  -0.2140294  8.305983e-01
10    x:groupD -0.0025544113 0.00474026  -0.5388757  5.901766e-01
11    x:groupE  0.0045780202 0.00474026   0.9657740  3.345543e-01
12    x:groupF -0.0058636354 0.00474026  -1.2369859  2.165861e-01

结果是一个 data.frame,因此您可以轻松过滤交互项(名称中有冒号):

pvals <- tidy(myMod1)[, c(1,5)]
pvals[grepl(":", pvals$term), ]

       term   p.value
8  x:groupB 0.9106909
9  x:groupC 0.8305983
10 x:groupD 0.5901766
11 x:groupE 0.3345543
12 x:groupF 0.2165861

broomdplyr包配合得很好;例如,提取非交互组系数:

library(dplyr)
tidy(myMod1) %>%
  select(term, p.value) %>%
  filter(! grepl(":", term), term != "(Intercept)", term != "x")

    term       p.value
1 groupB 3.002513e-150
2 groupC 1.021613e-293
3 groupD  0.000000e+00
4 groupE  0.000000e+00
5 groupF  0.000000e+00
于 2015-06-01T20:28:23.153 回答