r - tapply() 函数依赖于 R 中的多个列

Question

在R中，我有一个包含 Location、sample_year 和 count 的表。所以，

Location sample_year count  
A        1995        1
A        1995        1  
A        2000        3  
B        2000        1  
B        2000        1  
B        2000        5

我想要一个汇总表，它检查“Location”和“sample_year”列，并根据这个独特的组合而不是单个列对“count”求和。所以，最终结果应该是：

Location sample_year sum_count
A        1995        2
A        2000        3
B        2000        7

我可以将列和数据合并到一个新列中以创建唯一的 Location-sample_year 但这不是一个干净的解决方案，尤其是如果我需要在某个时候将其扩展到三列。必须有更好的方法。

score 11 · Accepted Answer

您可以使用aggregate公式。

先上数据：

x <- read.table(textConnection("Location sample_year count  
A        1995        1
A        1995        1  
A        2000        3  
B        2000        1  
B        2000        1  
B        2000        5"), header = TRUE)

使用 sum 和指定分组的公式聚合：

aggregate(count ~ Location+sample_year, data = x, sum)
    Location sample_year count
1        A        1995     2
2        A        2000     3
3        B        2000     7

score 4 · Accepted Answer

或与reshape包装：

library(reshape)
md <- melt(x, measure.vars = "count")
cast(md, Location + sample_year ~ variable, sum)
  Location sample_year count
1        A        1995     2
2        A        2000     3
3        B        2000     7

编辑：

我使用x了@mdsumner 回答中的对象。无论如何......我建议你坚持他的回答，因为它不依赖于外部包（aggregate函数与 R 捆绑在一起，除非你分离stats包......）。而且，顺便说一句，它比reshape解决方案更快。

score 2 · Accepted Answer

或plyr（使用x来自@mdsummer）

library(plyr)
ddply(x, .(Location,sample_year), summarise, count = sum(count))

r - tapply() 函数依赖于 R 中的多个列

3 回答 3

Related

Reference