您可以使用by()
. 首先设置一些数据:
R> set.seed(42)
R> testdf <- data.frame(var1=rnorm(100), var2=rnorm(100,2), var3=rnorm(100,3),
group=as.factor(sample(letters[1:10],100,replace=T)),
year=as.factor(sample(c(2007,2009),100,replace=T)))
R> summary(testdf)
var1 var2 var3 group year
Min. :-2.9931 Min. :-0.0247 Min. :0.30 e :15 2007:50
1st Qu.:-0.6167 1st Qu.: 1.4085 1st Qu.:2.29 c :14 2009:50
Median : 0.0898 Median : 1.9307 Median :2.98 f :12
Mean : 0.0325 Mean : 1.9125 Mean :2.99 h :12
3rd Qu.: 0.6616 3rd Qu.: 2.4618 3rd Qu.:3.65 d :11
Max. : 2.2866 Max. : 4.7019 Max. :5.46 b :10
(Other):26
使用by()
:
R> by(testdf[,1:3], testdf$year, mean)
testdf$year: 2007
var1 var2 var3
0.04681 1.77638 3.00122
---------------------------------------------------------------------
testdf$year: 2009
var1 var2 var3
0.01822 2.04865 2.97805
R> by(testdf[,1:3], list(testdf$group, testdf$year), mean)
## longer answer by group and year suppressed
您仍然需要为您的表格重新格式化它,但它确实在一行中为您提供了答案的要点。
编辑:可以通过进一步处理
R> foo <- by(testdf[,1:3], list(testdf$group, testdf$year), mean)
R> do.call(rbind, foo)
var1 var2 var3
[1,] 0.62352 0.2549 3.157
[2,] 0.08867 1.8313 3.607
[3,] -0.69093 2.5431 3.094
[4,] 0.02792 2.8068 3.181
[5,] -0.26423 1.3269 2.781
[6,] 0.07119 1.9453 3.284
[7,] -0.10438 2.1181 3.783
[8,] 0.21147 1.6345 2.470
[9,] 1.17986 1.6518 2.362
[10,] -0.42708 1.5683 3.144
[11,] -0.82681 1.9528 2.740
[12,] -0.27191 1.8333 3.090
[13,] 0.15854 2.2830 2.949
[14,] 0.16438 2.2455 3.100
[15,] 0.07489 2.1798 2.451
[16,] -0.03479 1.6800 3.099
[17,] 0.48082 1.8883 2.569
[18,] 0.32381 2.4015 3.332
[19,] -0.47319 1.5016 2.903
[20,] 0.11743 2.2645 3.452
R> do.call(rbind, dimnames(foo))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
[2,] "2007" "2009" "2007" "2009" "2007" "2009" "2007" "2009" "2007" "2009"
你可以玩dimnames
更多:
R> expand.grid(dimnames(foo))
Var1 Var2
1 a 2007
2 b 2007
3 c 2007
4 d 2007
5 e 2007
6 f 2007
7 g 2007
8 h 2007
9 i 2007
10 j 2007
11 a 2009
12 b 2009
13 c 2009
14 d 2009
15 e 2009
16 f 2009
17 g 2009
18 h 2009
19 i 2009
20 j 2009
R>
编辑:有了它,我们可以为结果创建一个data.frame
,而无需使用仅使用基本 R 的外部包:
R> data.frame(cbind(expand.grid(dimnames(foo)), do.call(rbind, foo)))
Var1 Var2 var1 var2 var3
1 a 2007 0.62352 0.2549 3.157
2 b 2007 0.08867 1.8313 3.607
3 c 2007 -0.69093 2.5431 3.094
4 d 2007 0.02792 2.8068 3.181
5 e 2007 -0.26423 1.3269 2.781
6 f 2007 0.07119 1.9453 3.284
7 g 2007 -0.10438 2.1181 3.783
8 h 2007 0.21147 1.6345 2.470
9 i 2007 1.17986 1.6518 2.362
10 j 2007 -0.42708 1.5683 3.144
11 a 2009 -0.82681 1.9528 2.740
12 b 2009 -0.27191 1.8333 3.090
13 c 2009 0.15854 2.2830 2.949
14 d 2009 0.16438 2.2455 3.100
15 e 2009 0.07489 2.1798 2.451
16 f 2009 -0.03479 1.6800 3.099
17 g 2009 0.48082 1.8883 2.569
18 h 2009 0.32381 2.4015 3.332
19 i 2009 -0.47319 1.5016 2.903
20 j 2009 0.11743 2.2645 3.452
R>