r - R-multiple 与不平衡设计输出的比较不同于 Anova

Question

在我的实验中，我剪下植物并测量它们的反应，例如在季节结束时产生的叶片质量。我操纵了剪裁强度和剪裁时间，并跨越了这两种治疗方法。我还包括了一个控制剪裁处理，产生了 5 种不同的剪裁处理组合。每次处理 12 株植物，我在两年内跟踪了总共 60 株植物。也就是说，我在第 1 年收集了这 60 株植物的测量值，并在第 2 年再次收集了相同的植物。

这是我的设计，其中时间下的“从不”和强度下的“零”任意替换了“控制”处理：

 Year   Timing  intensity   treatments
 2015   early   high       early-high
 2015   early   low        early-low
 2015   late    high       late-high
 2015   late    low        late-low
 2015   never   zero       control
 2014   early   high       early-high
 2014   early   low        early-low
 2014   late    high       late-high
 2014   late    low        late-low
 2014   never   zero       control

我按照 Ben Bolker 的一项建议忽略了运行 lme4 的警告，然后对模型进行了 F 检验（R- 分析重复测量不平衡设计与 lme4？）：

m1<-lmer(log(plant.leaf.g)~timing*intensity*year+(1|id), data=cmv)

Anova(m1, type="III", test="F")

方差分析输出给了我时间和强度之间的显着相互作用（p = 0.006），然后我使用以下方法进行了多重比较测试：

cmv$SHD<-interaction(cmv$timing, cmv$intensity)
m2<-lmer(log(plant.leaf.g)~-1+SHD+(1|id),data=cmv,  na.action=na.exclude)
summary(glht(m2, linfct=mcp(SHD="Tukey")))

这是我输出的剪辑，其中 p=0.08 的唯一重要对：

                                Estimate Std. Error z value Pr(>|z|)  
late.2014 - early.2014 == 0   -0.6584     0.3448  -1.910   0.3844  
never.2014 - early.2014 == 0   0.1450     0.4102   0.354   0.9992  
early.2015 - early.2014 == 0  -0.4906     0.2786  -1.761   0.4788  
late.2015 - early.2014 == 0   -0.1687     0.3494  -0.483   0.9965  
never.2015 - early.2014 == 0   0.4201     0.4079   1.030   0.9032  
never.2014 - late.2014 == 0    0.8034     0.4119   1.951   0.3597  
early.2015 - late.2014 == 0    0.1678     0.3419   0.491   0.9963  
late.2015 - late.2014 == 0     0.4897     0.2724   1.797   0.4553  
never.2015 - late.2014 == 0    1.0785     0.4119   2.618   0.0885 .
early.2015 - never.2014 == 0  -0.6356     0.4074  -1.560   0.6133

为什么 Anova 认为时间*强度非常显着，但在我的多重比较测试中没有显着性？还有另一种方法我应该进行多重比较吗？

在其他多重比较输出中，我得到的 p 值高达 1.00000，这正常吗？

data<-structure(list(id = c(91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 
99L, 100L, 101L, 102L, 103L, 105L, 106L, 107L, 108L, 109L, 110L, 
111L, 112L, 113L, 114L, 115L, 116L, 117L, 119L, 120L, 121L, 122L, 
123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L, 
134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L, 144L, 
146L, 147L, 148L, 149L, 150L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 
98L, 99L, 100L, 101L, 102L, 103L, 105L, 106L, 107L, 108L, 109L, 
110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L, 119L, 120L, 121L, 
122L, 123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 
133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L, 
144L, 146L, 147L, 148L, 149L, 150L), quad = c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), year = c(2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L), timing = structure(c(1L, 
3L, 2L, 1L, 1L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 1L, 
3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 
1L, 3L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 2L, 
2L, 2L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 3L, 2L, 1L, 1L, 2L, 3L, 2L, 
2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 1L, 3L, 2L, 3L, 1L, 3L, 2L, 3L, 
1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 2L, 2L, 1L, 2L, 
3L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 3L, 
1L), .Label = c("early", "late", "never"), class = "factor"), 
intensity = structure(c(2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 
1L, 2L, 1L, 1L, 2L, 3L, 2L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 
1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 1L, 1L, 
1L, 3L, 1L, 1L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 
2L, 1L, 3L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 
1L, 2L, 3L, 2L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 
1L, 2L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 1L, 1L, 1L, 3L, 1L, 
1L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 3L, 2L
), .Label = c("high", "low", "zero"), class = "factor"), 
treatment = structure(c(3L, 1L, 4L, 3L, 2L, 5L, 1L, 4L, 5L, 
4L, 5L, 2L, 2L, 5L, 1L, 3L, 2L, 1L, 4L, 1L, 2L, 1L, 4L, 1L, 
2L, 3L, 2L, 4L, 3L, 5L, 5L, 3L, 2L, 3L, 1L, 1L, 5L, 4L, 2L, 
4L, 1L, 4L, 2L, 3L, 5L, 4L, 1L, 3L, 4L, 5L, 4L, 2L, 3L, 5L, 
3L, 2L, 1L, 3L, 1L, 4L, 3L, 2L, 5L, 1L, 4L, 5L, 4L, 5L, 2L, 
2L, 5L, 1L, 3L, 2L, 1L, 4L, 1L, 2L, 1L, 4L, 1L, 2L, 3L, 2L, 
4L, 3L, 5L, 5L, 3L, 2L, 3L, 1L, 1L, 5L, 4L, 2L, 4L, 1L, 4L, 
2L, 3L, 5L, 4L, 1L, 3L, 4L, 5L, 4L, 4L, 3L, 5L, 2L, 1L, 3L
), .Label = c("control", "early-high", "early-low", "late-high", 
"late-low"), class = "factor"), plant.leaf.g = c(846.216, 
382.704, 2393.088, 61.832, 1315.86, 275.816, 3705.862, 3500.52, 
67.482, 432, 487.492, 1228.618, 776.16, 1575, 735.9, 2417.75, 
1342.92, 2359.046, 686.726, 1385.856, 343.684, 2277.312, 
465.528, 2314.584, 508.4, 1243.644, 1064.448, 1020.646, NA, 
494.832, 1318.248, 1516.4, 1271.218, 512.512, 157.878, 3753.992, 
586.032, 1042.176, 889.632, 651.052, 498.042, 625.872, 16.28, 
497.51, 593.75, 706.84, 2238.742, 232.584, 671.532, 90.72, 
1412.442, 902.728, 3077.184, 619.106, 0.576, 400.452, 684.522, 
849.852, 152.76, 1280.448, 274.47, 387.614, 98.496, 2304.504, 
644.952, 35.392, 250.56, 267.33, 2212.08, 2392.596, 751.944, 
629.418, 731.544, 1013.196, 1516.4, 130.536, 2910.6, 554.4, 
2163.35, 223.86, 2369.376, 551.976, 985.6, 1482.24, 815.386, 
1664.132, 596.376, 1581.432, 217.128, 1041.656, 951.168, 
256.172, 1587.148, 359.448, 546.48, 1226.544, 371.64, 293.504, 
177.726, 343.26, 691.24, 207.604, 588.924, 1405.258, 136.17, 
451.432, 576.18, 424.804, 884.534, 2466.45, 1524.432, 973.208, 
369.474, 410.048)), .Names = c("id", "quad", "year", "timing", 
"intensity", "treatment", "plant.leaf.g"), class = "data.frame", row.names = c(NA, 
-114L))

PS。我一辈子都无法使用 lsmeans 来处理这种不平衡的设计。输出中报告了很多 NA。

score 0 · Accepted Answer

因为我没有仔细阅读您的问题，所以我没有意识到您没有尝试过lsmeans使用treatment代替的模型timing*intensity（由于某种原因，您提供的数据集具有不同的名称，而treatment不是SHD）。如果你这样做，它工作得很好：

> m3<-lmer(log(plant.leaf.g) ~ treatment+year+(1|id), data=data)
> library(lsmeans)
> lsmeans(m3, "treatment", type = "response")
Loading required namespace: pbkrtest
 treatment   response       SE    df lower.CL  upper.CL
 control    1017.7290 289.1544 62.29 576.7671 1795.8244
 early-high  909.2335 260.3904 68.44 513.4725 1610.0288
 early-low   388.1875 116.3790 65.92 213.3433  706.3242
 late-high   626.5379 176.6823 56.61 356.1791 1102.1134
 late-low    393.3225 125.4142 51.60 207.4053  745.8947

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

> pairs(.Last.value, type = "response")
 contrast               response.ratio        SE    df t.ratio p.value
 control - early-high        1.1193264 0.4457451 72.14   0.283  0.9986
 control - early-low         2.6217461 1.0685111 71.26   2.365  0.1371
 control - late-high         1.6243693 0.6501965 59.75   1.212  0.7445
 control - late-low          2.5875182 1.1050617 56.07   2.226  0.1853
 early-high - early-low      2.3422535 0.9581385 74.29   2.081  0.2394
 early-high - late-high      1.4512026 0.5762732 68.49   0.938  0.8811
 early-high - late-low       2.3116745 0.9907174 58.53   1.955  0.3006
 early-low - late-high       0.6195754 0.2549274 61.77  -1.163  0.7719
 early-low - late-low        0.9869446 0.4319594 57.88  -0.030  1.0000
 late-high - late-low        1.5929371 0.6780835 53.73   1.094  0.8090

P value adjustment: tukey method for comparing a family of 5 estimates 
Tests are performed on the log scale

现在，关于你原来的问题。我们看到上面的最小 P 值约为 0.13，而

> library(car)
> Anova(m3)
Analysis of Deviance Table (Type II Wald chisquare tests)

Response: log(plant.leaf.g)
           Chisq Df Pr(>Chisq)
treatment 9.5752  4    0.04823
year      0.1147  1    0.73484

因此 ANOVA 检验的treatmentP 值约为 0.05。ANOVA $F$ 检验几乎不显着等同于说基于 Scheffe 临界值的水平之间的一些对比几乎不显着。treatment如果这种对比恰好是成对比较，或者几乎是成对比较，那么这种成对比较就会很重要。但这里不是这样。第一个和第二个均值比第三个和第五个要高得多，这导致我得出这个结果，对比非常重要：

> contrast(lsmeans(m3, "treatment"), list(my.con = c(1, 1, -1, 0, -1)))
 contrast estimate        SE    df t.ratio p.value
 my.con   1.801813 0.5910685 64.56   3.048  0.0033

Results are given on the log (not the response) scale. 
Tests are performed on the log scale

另一个比较显着的对比是高强度和低强度的比较：

> contrast(lsmeans(m3, "treatment"), list(hi.lo = c(0, 1, -1, 1, -1)/2))
 contrast  estimate        SE    df t.ratio p.value
 hi.lo    0.6583465 0.2967584 60.45   2.218  0.0303

请记住，ANOVA 测试并不能保证成对比较的结果。

score 0 · Accepted Answer

对此又开了一枪。您的 OP 知道这一点，但只是为了让其他人清楚，这里看看这些因素之间的关系：

R> with(data, table(timing, intensity, year))
, , year = 2014

       intensity
timing  high low zero
  early   11  11    0
  late    13  10    0
  never    0   0   12

, , year = 2015

       intensity
timing  high low zero
  early   12  11    0
  late    12  10    0
  never    0   0   12

注意timing = "never"withintensity = "zero"是一个特殊的控制条件，只有这些因素的其他水平组合出现。这就是为什么将它们视为单独因素的模型会导致解释困难。该因子treatment包含在数据集中，具有实际发生的 5 种组合的级别。

查看一些模型（我查看了残差图，我认为平方根变换是最好的）：

R> library("lme4")
R> m3 = lmer(sqrt(plant.leaf.g) ~ treatment + year + (1|id), data=data)
R> m4 = lmer(sqrt(plant.leaf.g) ~ treatment * year + (1|id), data=data)
Warning message:
Some predictor variables are on very different scales: consider rescaling

比较这两个模型：

R> anova(m3, m4)
refitting model(s) with ML (instead of REML)
Data: data
Models:
m3: sqrt(plant.leaf.g) ~ treatment + year + (1 | id)
m4: sqrt(plant.leaf.g) ~ treatment * year + (1 | id)
   Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
m3  8 876.92 898.74 -430.46   860.92                         
m4 12 872.14 904.87 -424.07   848.14 12.783      4    0.01238

year尽管有警告消息，但似乎还是值得包含与的交互。

为了解释这个模型，ANOVA 表的用途有限。查看模型预测的内容并进行适当的比较会提供更多信息。这些预测称为“最小二乘均值”。我们将每年分别计算它们（因为与的交互year）：

R> library("lsmeans")
R> (m4.lsm = lsmeans(m4, ~ treatment | year, at = list(year = c(2014, 2015)), type = "response"))
year = 2014:
 treatment   response       SE    df lower.CL  upper.CL
 control    1082.1190 224.6040 91.46 681.9805 1574.2165
 early-high 1149.8090 236.6617 97.92 728.1153 1667.4202
 early-low   647.5407 180.8791 92.57 338.1453 1056.5696
 late-high   490.0813 148.4696 82.72 239.2544  829.8841
 late-low    485.0953 171.6529 73.61 203.3380  887.4499

year = 2015:
 treatment   response       SE    df lower.CL  upper.CL
 control    1393.5241 254.9350 91.35 933.1535 1945.8958
 early-high  831.3529 192.9245 97.47 492.5582 1258.3147
 early-low   746.0977 199.8967 96.30 402.0731 1195.6255
 late-high  1050.4552 225.8614 83.42 649.2805 1547.6725
 late-low    520.3968 177.7890 73.61 226.4119  934.9789

Confidence level used: 0.95 
Intervals are back-transformed from the sqrt scale

最后我们可以对它们进行比较。可以进行所有成对比较，但也许对我来说更有用的是构建包含 4 df 的可解释细分的对比treatment：

R> trt.con = data.frame(
+     timing = c(0, 1, 1, -1, -1)/2,
+     intensity = c(0, 1, -1, 1, -1)/2,
+     tim.int = c(0, 1, -1, -1, 1)/2,
+     ctl.vs.trt = c(4, -1, -1, -1, -1)/4,
+     row.names = levels(data$treatment))
R> trt.con
           timing intensity tim.int ctl.vs.trt
control       0.0       0.0     0.0       1.00
early-high    0.5       0.5     0.5      -0.25
early-low     0.5      -0.5    -0.5      -0.25
late-high    -0.5       0.5    -0.5      -0.25
late-low     -0.5      -0.5     0.5      -0.25

这些包括要应用于treatment最小二乘均值的系数。例如，用于计时的比较两个早期计时的平均值和两个延迟计时的平均值。第三列是交互作用，第四列将控制条件与其他四个的平均值进行比较。现在我们可以测试这些对比：

R> contrast(m4.lsm, trt.con)
year = 2014:
 contrast     estimate       SE    df t.ratio p.value
 timing      7.5964984 3.586509 86.29   2.118  0.0370
 intensity   4.2874559 3.569964 87.92   1.201  0.2330
 tim.int     4.1745568 3.508331 94.05   1.190  0.2371
 ctl.vs.trt  7.0159980 3.800634 95.97   1.846  0.0680

year = 2015:
 contrast     estimate       SE    df t.ratio p.value
 timing      0.4625233 3.606278 87.74   0.128  0.8982
 intensity   5.5584603 3.596573 88.42   1.545  0.1258
 tim.int    -4.0400583 3.529456 94.91  -1.145  0.2552
 ctl.vs.trt  9.4872065 3.810418 95.75   2.490  0.0145

一个有趣的结果是，有一些证据表明控制条件与治疗在这两年中有所不同，并且timing仅在 2014 年对比才显着。

（我意识到这已经成为一个 CrossValidated 风格的答案，而不是 StackExchange 的答案；但最终，做一些有意义的事情胜过仅仅让程序运行。）

r - R-multiple 与不平衡设计输出的比较不同于 Anova

2 回答 2

Related

Reference