r - 使用 lapply 和 ddply 函数

Question

我正在尝试对我的示例数据（调用 Z）使用 ddply，如下所示：

我的目的是找到从 1 开始的 id 的 y 的总和（即 1001,1200,..), 2(2100), 3(3100,3190), 4,...10,11,...65 . 例如 id 以 1 开头，总和为 10+11+12=33，对于 id 以 2 开头，则为 32。

我想使用如下所示的 apply 函数：

>s <- split(z,z$id)
>lapply(s, function(x) colSums(x[, c("y")]))

但是，这给了我每个唯一 ID 的总和，而不是我正在寻找的那个。在这方面的任何建议将不胜感激。

score 5 · Accepted Answer

这是一个用于执行整数除法的data.table解决方案%/%（返回多少千）

library(data.table)
DT <- data.table(z)

x <- DT[,list(sum_y = sum(y)), by = list(id = id %/% 1000)]
x
   id sum_y
1:  1    33
2:  2    54
3:  3    23
4:  4    45
5:  5   123
6: 10    99

你可以做类似的ddply

ddply(z, .(id = id %/% 1000 ), summarize, sum_y = sum(y))
  id sum_y
1  1    33
2  2    54
3  3    23
4  4    45
5  5   123
6 10    99

score 3 · Accepted Answer

这是否给了您预期的答案？

z <- read.table(textConnection("id y
1001 10
1001 11
1200 12
2001 10
2030 12
2100 32
3100 10
3190 13
4100 45
5100 67
5670 56
10001 54
10345 45"),header=TRUE)

result <- tapply(
                 z$y,
                 as.numeric(substr(z$id,1,nchar(z$id)-3)),
                 sum
                )

result
  1   2   3   4   5  10 
 33  54  23  45 123  99

要从上面窃取@mnel 的行，可以简化为：

result <- tapply(
                 z$y,
                 z$id %/% 1000,
                 sum
                )

score 3 · Accepted Answer

thelatemail 提供了一种有效的方法，但我想指出问题不在于您对lapply（您的代码几乎是正确的）的理解，而在于考虑分组。thelatemail 在他的解决方案中做到了这一点，这就是关键。我将向您展示您的方法，然后我将如何实际处理它，然后使用ave它，因为我从来没有使用过它:)

读入数据

z <- read.table(textConnection("id y #stole this from the latemail
1001 10
1001 11
1200 12
2001 10
2030 12
2100 32
3100 10
3190 13
4100 45
5100 67
5670 56
10001 54
10345 45"),header=TRUE)

您的代码已调整

s <- split(z, substring(as.character(z$id), 1, nchar(as.character(z$id)) - 3))
lapply(s, function(x) sum(x[, "y"]))

我可能会采取的方法；添加一个新的因子 id 变量

z$IDgroup <- substring(as.character(z$id), 1, nchar(as.character(z$id)) - 3)
aggregate(y ~ IDgroup, z, sum)
#similar approach but adds the solution back as a new column
z$group.sum <- ave(z$y, z$IDgroup, FUN=sum)
z

r - 使用 lapply 和 ddply 函数

3 回答 3

Related

Reference