r - 聚合一个没有函数的data.frame

Question

我有示例数据框

test.df<-data.frame(id=c("A","A","A","B","B","B"), time=c(1:3,1:3), x1=c(1,1,1,2,2,2), x2=c("A","A","A","B","B","B"))

x1并且x2每个 id 中的变量都相同

我想汇总上面的数据框得到以下

target.df<-data.frame(id=c("A","B"), x1=c(1,2), x2=c("A","B"))

从某种意义上说，我希望aggregate没有任何FUN. 我试过FUN=unique但似乎没有用。我的原始数据框有 100 万行和数千个x1,x2....不同类型（字符、日期等）的变量，但在每个 ID 中都是相同的。这与excel中的数据透视表相同

非常感谢

score 5 · Accepted Answer

您所说的问题似乎是从 a 中删除重复的行data.frame，这不需要任何聚合。根据您的示例，这就是您所追求的：

unique(test.df[c(1,3,4)])
# id x1 x2
#1  A  1  A
#4  B  2  B

编辑：

我不太明白你的意思是什么：

“我试过了，FUN=unique但似乎没有用。”

只是为了解释你可能犯了什么错aggregate误，在这里，我展示了如何得到同样的结果aggregate：

test.df$x2 <- as.character(test.df$x2)
aggregate(. ~ id, FUN=unique , data = test.df[c(1,3,4)] )

#  id x1 x2
#1  A  1  A
#2  B  2  B

但是，这里没有必要使用aggregate()。这个问题的效率非常低。您可以检查一下，system.time(.)即使在此数据上也有所不同：

system.time(unique(test.df[c(1,3,4)]))
#    user  system elapsed 
#   0.001   0.000   0.001 
system.time(aggregate(. ~ id, FUN=unique , data = test.df[c(1,3,4)] ))
#    user  system elapsed 
#   0.004   0.000   0.004

继续在你的百万行上运行它并检查你的结果identical并查看运行时间。

从您的评论中，我认为您对unique. 正如@mnel 解释的那样，它 (unique.data.frame)从给定的中单独删除所有重复的行data.frame。它适用于您的情况，因为您这么说x1并且x2每个都具有相同的值ID。因此，您不必知道其中的data.frame ID位置。您只需要为每个 ID 选择 1 行。

r - 聚合一个没有函数的data.frame

1 回答 1

编辑：

Related

Reference