4

我正在尝试计算XYZmin同一日期发生的值的总和(in )。

我的数据看起来像这样

bar <- structure(list(date = structure(c(15622, 15622, 15622, 15628, 
15632, 15635, 15639, 15639, 15639, 15639, 15639, 15642, 15646, 
15646, 15650, 15650, 15650, 15657, 15660, 15660, 15674, 15681, 
15691, 15695, 15709, 15716, 15723, 15730, 15737, 15737, 15737, 
15737, 15737, 15737, 15740, 15743, 15743, 15743, 15744, 15744, 
15744, 15744, 15746, 15751, 15755, 15758), class = "Date"), XYZmin = c(-20, 
-15, -10, -70, -60, -60, -95, -10, -10, -40, -25, -25, -20, -10, 
-3, -5, -25, -5, -70, -5, -30, -30, -25, 60, 60, 60, 60, 60, 
-10, -10, -30, -30, -10, -10, -10, -60, -30, -10, 75, -10, -10, 
-10, 60, 60, -15, 60)), .Names = c("date", "XYZmin"), class = "data.frame", row.names = c(NA, 
-46L))

head(bar)   
        date XYZmin
1 2012-10-09    -20
2 2012-10-09    -15
3 2012-10-09    -10
4 2012-10-15    -70
5 2012-10-19    -60
6 2012-10-22    -60

我正在努力完成的是创建一个新变量XYZtot,其中,在多次出现的数据中,将第二个数据的第一个和第二个值相加,并将第三个数据的第一个、第二个和第三个值相加。这是我的目标的一个片段。

head(new_bar_with_XYZtot) 

        date XYZmin XYZtot
1 2012-10-09    -20    -20
2 2012-10-09    -15    -35
3 2012-10-09    -10    -40
4 2012-10-15    -70    -70
5 2012-10-19    -60    -60
6 2012-10-22    -60    -60

更新了 microbenchmark测试

alexwhan <- function(bar,date,XYZmin) ddply(bar, .(date), transform, XYZmin.sum = cumsum(XYZmin))

Arun <- function(bar,date,XYZmin) within(bar, {XYZtot <- ave( XYZmin, date, FUN=cumsum)})

agstudy <- function(bar,date,XYZmin) transform(bar, XYZtot = ave(XYZmin, date, FUN = cumsum))

# install.packages("data.table", dependencies = TRUE)
library(data.table)
mnel <- function(bar,date,XYZmin)  bar <- data.table(bar); bar[, XYZmin.sum := cumsum(XYZmin), by = date]

# install.packages("microbenchmark", dependencies = TRUE)
require(microbenchmark)

# run test
res <- microbenchmark(alexwhan(bar,date,XYZmin), Arun(bar,date,XYZmin), agstudy(bar,date,XYZmin), mnel(bar,date,XYZmin), times = 666)


## Print results:
print(res)

号码,

Unit: microseconds
                        expr       min        lq    median        uq       max neval
 alexwhan(bar, date, XYZmin) 14484.077 15056.613 15237.760 15945.482 72650.126   666
     Arun(bar, date, XYZmin)   963.632  1018.311  1070.759  1138.655  4988.226   666
  agstudy(bar, date, XYZmin)  1967.292  2021.115  2078.261  2158.689  9240.500   666
     mnel(bar, date, XYZmin)   251.312   270.295   282.821   325.040  6540.367   666


### Plot results:
boxplot(res)

绘制结果

4

4 回答 4

5

如果您要花钱,我会提出data.table解决方案

library(data.table)
bar <- data.table(bar)

# assigning within bar
bar[, XYZmin.sum := cumsum(XYZmin), by = date]

这将适用于大数据!

于 2013-03-12T23:32:58.387 回答
3

这是一个使用ave

bar <- within(bar, {XYZtot <- ave( XYZmin, date, FUN=cumsum)})
于 2013-03-12T20:39:05.160 回答
3

也使用ave 但与transform

transform(bar, XYZtot = ave(XYZmin, date, FUN = cumsum))

OP评论后编辑

transform(bar, XYZtot = ave(XYZmin, date, FUN = 
                          function(x)
                            if(length(x) < 1) NA 
                            else c(cumsum(x[-length(x)]),NA)))
于 2013-03-12T20:40:18.220 回答
1

这就是你所追求的吗?

bar.sum <- ddply(bar, .(date), transform,
                 XYZmin.sum = cumsum(XYZmin))
bar.sum
于 2013-03-12T20:37:34.037 回答