r - 如何在聚合中获得相同的结果？

Question

我怎样才能得到汇总的结果？

x=iris[,1:4]
transform(x,"sum"=apply(x,MARGIN=1,FUN=sum))

输出是：

    Sepal.Length Sepal.Width Petal.Length Petal.Width  sum
1            5.1         3.5          1.4         0.2 10.2
2            4.9         3.0          1.4         0.2  9.5
3            4.7         3.2          1.3         0.2  9.4
4            4.6         3.1          1.5         0.2  9.4

（省略了很多行），我只是想更好地了解聚合，也许很难得到与聚合函数应用相同的结果。

score 2 · Accepted Answer

您的问题似乎与我期望遵循的代码有所不同。aggregate旨在将特定功能“应用”到列，但仅限于由“by”参数划分的类别内。它旨在“在特定类别中聚合。

apply（其第二个参数设置为 2 而不是您的代码中的 1）将在整个列上使用一个函数。没有分组变量。您的编码器正在逐行运行具有不同含义和导入的向量，因此它返回每个人的四个不同测量值的单独总和，除非已经建立了该过程的一些准备或基础，否则这个过程可以说是毫无意义的。

如果您想以类似于聚合实现的方式使用应用，请查看以下内容：

> sapply( split(iris[,1:4], iris[, 5]), apply, 2, sum)
             setosa versicolor virginica
Sepal.Length  250.3      296.8     329.4
Sepal.Width   171.4      138.5     148.7
Petal.Length   73.1      213.0     277.6
Petal.Width    12.3       66.3     101.3


> aggregate(iris[ ,1: 4], iris[5], FUN=sum)
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        250.3       171.4         73.1        12.3
2 versicolor        296.8       138.5        213.0        66.3
3  virginica        329.4       148.7        277.6       101.3

如果您的目标不是进行任何按类别计算，您将传递一个与数据帧的行数相同长度的列表：

> aggregate(iris[ ,1: 4], list(rep(1,nrow(iris))),  FUN=sum)
  Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1       1        876.5       458.6        563.7       179.9
> apply(iris[1:4], 2, sum)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       876.5        458.6        563.7        179.9

score 0 · Accepted Answer

如果 Rubens 是正确的，并且您想使用apply而不是aggregate，并且您对与aggregate今天之前的帖子中相同的表达式感兴趣，那么您可以使用tapply.

～总之是什么意思？

x=iris[,1:4]
names(x)<-c("x1","x2","x3","x4")
aggregate(x1+x2+x3+x4~x1,FUN=sum,data=x)
tapply((x$x1 + x$x2 + x$x3 + x$x4), x$x1, sum)

编辑以添加sapply和lapply修改 DWin 的答案，以给出与上面相同的答案tapply， aggregate以及rapply,vapply和重新格式化tapply的by函数：

with(x, sapply(split((x1 + x2 + x3 + x4), x1), sum))
with(x, lapply(split((x1 + x2 + x3 + x4), x1), sum))
with(x, rapply(split((x1 + x2 + x3 + x4), x1), sum))
with(x, tapply(      (x1 + x2 + x3 + x4), x1 , sum))
with(x, vapply(split((x1 + x2 + x3 + x4), x1), sum, FUN.VALUE=1))
with(x, by((x1 + x2 + x3 + x4), x1, sum))

我还没有想出如何得到相同的答案mapply。好吧，这是一种方法，但它非常愚蠢：

tapply(mapply(sum, x$x1 , x$x2 , x$x3 , x$x4), x$x1, sum)

最后，这是一种使用apply(inside tapply) 获得与上面其他行给出的相同答案的方法：

tapply(apply((x[,1:4]),1,sum),x$x1,sum)

最后一件事，如果您确实想aggregate返回与帖子中声明相同的答案apply，这是可能的。但是，您所做的只是将每个单独的行与您的apply语句相加。因此，您将不得不“欺骗”aggregate认为 Iris 数据集中的每一行都有一个单独的组，如下所示：

x=iris[,1:4]
names(x)<-c("x1","x2","x3","x4")
apply.sums <- transform(x,"sum"=apply(x,MARGIN=1,FUN=sum))
my.factor <- seq(1, nrow(x))
ag.sums <- aggregate(x1+x2+x3+x4~my.factor,FUN=sum,data=x)
round(ag.sums[,2],2) == round(apply.sums[,5],2)

r - 如何在聚合中获得相同的结果？

2 回答 2

Related

Reference