6

我想使用 ddply 函数将相同的函数写入多个列,但我尝试将它们写在一行中,想看看有没有更好的方法?

这是数据的简单版本:

data<-data.frame(TYPE=as.integer(runif(20,1,3)),A_MEAN_WEIGHT=runif(20,1,100),B_MEAN_WEIGHT=runif(20,1,10))

我想通过这样做找出列 A_MEAN_WEIGHT 和 B_MEAN_WEIGHT 的总和:

ddply(data,.(TYPE),summarise,MEAN_A=sum(A_MEAN_WEIGHT),MEAN_B=sum(B_MEAN_WEIGHT))

但在我目前的数据中,我有超过 8 个“*_MEAN_WEIGHT”,我厌倦了将它们写 8 次

ddply(data,.(TYPE),summarise,MEAN_A=sum(A_MEAN_WEIGHT),MEAN_B=sum(B_MEAN_WEIGHT),MEAN_C=sum(C_MEAN_WEIGHT),MEAN_D=sum(D_MEAN_WEIGHT),MEAN_E=sum(E_MEAN_WEIGHT),MEAN_F=sum(F_MEAN_WEIGHT),MEAN_G=sum(G_MEAN_WEIGHT),MEAN_H=sum(H_MEAN_WEIGHT))

有没有更好的方法来写这个?谢谢您的帮助!!

4

2 回答 2

6

以 - 为中心的plyr方法是使用colwise

例如

 ddply(data, .(TYPE), colwise(sum))
  TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1    1      319.8977      60.80317
2    2      621.6745      37.05863

.col 如果您只想要一个子集,您可以将列名作为参数传递

您还可以使用numcolwisecatcolwise仅对数字或分类列进行操作。

请注意,您可以使用sapply来代替最基本的使用colwise

ddply(data, .(TYPE), sapply, FUN = 'mean') 

惯用的 data.table 方法是使用lapply(.SD, fun)

例如

dt <- data.table(data)
dt[,lapply(.SD, sum) ,by = TYPE]
   TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1:    2      621.6745      37.05863
2:    1      319.8977      60.80317
于 2013-04-18T23:25:23.223 回答
4

尝试这个:

ddply(data, .(TYPE), colSums)

这是上面的(较慢)等价物,可以对其进行调整以放置任何函数而不是求和:

ddply(data, .(TYPE), function(x) {apply(x, 2, sum)})

而且,如果您想保留该.(TYPE)列,则可以这样做:

ddply(data, .(TYPE), function(x) {apply(x[,names(x) != "TYPE"], 2, sum)})

更好的是,使用data.table代替plyr

library(data.table)
dt = data.table(data)

# just sums
dt[, data.table(t(colSums(.SD))), by = TYPE]

# sum for "A" and "B", and sqrt(sum) for "C" and "D"
# note: you will have to call setnames() to fix the column names after
dt[, data.table(t(colSums(.SD[, c("A_MEAN_WEIGHT", "B_MEAN_WEIGHT"), with = F])),
                t(apply(.SD[, c("C_MEAN_WEIGHT", "D_MEAN_WEIGHT"), with = F],
                        2, function(x) sqrt(sum(x))))),
     by = TYPE]
于 2013-04-18T18:35:38.353 回答