该练习包括通过将因子与 R 中的 data.table 组合来聚合值的数值向量。以以下数据表为例:
require (data.table)
require (plyr)
dtb <- data.table (cbind (expand.grid (month = rep (month.abb[1:3], each = 3),
fac = letters[1:3]),
value = rnorm (27)))
请注意,'month' 和 'fac' 的每个唯一组合都会出现 3 次。因此,当我尝试通过这两个因素对值进行平均时,我应该期望一个具有 9 个唯一行的数据框:
(agg1 <- ddply (dtb, c ("month", "fac"), function (dfr) mean (dfr$value)))
month fac V1
1 Jan a -0.36030953
2 Jan b -0.58444588
3 Jan c -0.15472876
4 Feb a -0.05674483
5 Feb b 0.26415972
6 Feb c -1.62346772
7 Mar a 0.24560510
8 Mar b 0.82548140
9 Mar c 0.18721114
但是,当与 data.table 聚合时,我不断得到两个因素的每个冗余组合提供的结果:
(agg2 <- dtb[, value := mean (value), by = list (month, fac)])
month fac value
1: Jan a -0.36030953
2: Jan a -0.36030953
3: Jan a -0.36030953
4: Feb a -0.05674483
5: Feb a -0.05674483
6: Feb a -0.05674483
7: Mar a 0.24560510
8: Mar a 0.24560510
9: Mar a 0.24560510
10: Jan b -0.58444588
11: Jan b -0.58444588
12: Jan b -0.58444588
13: Feb b 0.26415972
14: Feb b 0.26415972
15: Feb b 0.26415972
16: Mar b 0.82548140
17: Mar b 0.82548140
18: Mar b 0.82548140
19: Jan c -0.15472876
20: Jan c -0.15472876
21: Jan c -0.15472876
22: Feb c -1.62346772
23: Feb c -1.62346772
24: Feb c -1.62346772
25: Mar c 0.18721114
26: Mar c 0.18721114
27: Mar c 0.18721114
month fac value
有没有一种优雅的方法可以将这些结果与数据表的每个独特因素组合折叠成一行?