r - 如何使用 ddply 获取数据框中类的加权平均值？

Question

我是 plyr 的新手，想采用类中值的加权平均值来重塑多个变量的数据框。使用以下代码，我知道如何对一个变量执行此操作，例如 x2：

set.seed(123)
frame <- data.frame(class=sample(LETTERS[1:5], replace = TRUE),
                    x=rnorm(20), x2 = rnorm(20), weights=rnorm(20))
ddply(frame, .(class),function(x) data.frame(weighted.mean(x$x2, x$weights)))

但是，我希望代码为 x 和 x2 （以及框架中的任何数量的变量）创建一个新的数据框。有人知道怎么做这个吗？谢谢

score 7 · Accepted Answer

你可能会在函数中找到你想要的?summarise。我可以复制您的代码summarise如下：

library(plyr)
set.seed(123)
frame <- data.frame(class=sample(LETTERS[1:5], replace = TRUE), x=rnorm(20), 
                    x2 = rnorm(20), weights=rnorm(20))
ddply(frame, .(class), summarise, 
      x2 = weighted.mean(x2, weights))

为此x，只需将该行添加到summarise函数中：

ddply(frame, .(class), summarise, 
      x = weighted.mean(x, weights),
      x2 = weighted.mean(x2, weights))

编辑：如果您想对多列进行操作，请使用colwiseornumcolwise代替summarise，或者使用包summarise对melted 数据框执行操作reshape2，然后cast返回原始形式。这是一个例子。

那将给出：

wmean.vars <- c("x", "x2")

ddply(frame, .(class), function(x)
      colwise(weighted.mean, w = x$weights)(x[wmean.vars]))

最后，如果您不喜欢指定wmean.vars，您也可以这样做：

ddply(frame, .(class), function(x)
      numcolwise(weighted.mean, w = x$weights)(x[!colnames(x) %in% "weights"]))

它将计算每个数值字段的加权平均值，不包括权重本身。

score 3 · Accepted Answer

一个data.table有趣的答案，它也不需要单独指定所有变量。

library(data.table)
frame <- as.data.table(frame)
keynames <- setdiff(names(frame),c("class","weights"))
frame[, lapply(.SD,weighted.mean,w=weights), by=class, .SDcols=keynames]

结果：

   class          x         x2
1:     B  0.1390808 -1.7605032
2:     D  1.3585759 -0.1493795
3:     C -0.6502627  0.2530720
4:     E  2.6657227 -3.7607866

r - 如何使用 ddply 获取数据框中类的加权平均值？

2 回答 2

Related

Reference