1

给我按月平均销售额的聚合函数工作正常。

library(chron)
set.seed(42)
dat <- data.frame(sales = rnorm(1000, mean = 1000, sd = 40),
              dates = rep(as.Date(seq(from = 14610, to = 14859),
                              origin = "1970-01-01"),4))
aggregate(sales~months(as.chron(dates)), mean, data=dat)

...并产生以下输出:

months(as.chron(dates))     sales
1                     Jan 1000.0723
2                     Feb  999.1580
3                     Mar  995.3055
4                     Apr 1000.4912
5                     May 1003.9703
6                     Jun  997.1086
7                     Jul  996.5939
8                     Aug  998.5012
9                     Sep 1001.3709

我的理解是以下 cast 语句应该产生相同的输出:

cast(dat, months(as.chron(dates)) ~ ., mean, value="sales")

而是返回以下错误:

Error: Casting formula contains variables not found in molten data: months(as.chron(dates))

我可能会遗漏一些东西,但是否可以在 cast 语句中使用 chronmonths() 调用?以下两个语句将在 cast() 中完成相同的操作,但我试图一步完成并更好地了解 cast 的工作原理。

dat$mont <- months(as.chron(dat$dates))
cast(dat, mont ~ ., mean, value="sales")

提前致谢,--JT

4

1 回答 1

3

这将与reshape2

library(reshape2)
dcast(dat, months(as.chron(dates)) ~ ., mean, value.var="sales")
##   months(as.chron(dates))        NA
## 1                     Jan 1004.5404
## 2                     Feb 1002.3146
## 3                     Mar  996.0883
## 4                     Apr  994.1707
## 5                     May 1000.4652
## 6                     Jun 1002.8020
## 7                     Jul  996.0357
## 8                     Aug 1001.6754
## 9                     Sep  997.6772

或者你可以使用plyr

library(plyr)
ddply(dat, .(months = months(as.chron(dates))), summarize, sales = mean(sales))
##  months     sales
## 1   Jan 1004.5404
## 2   Feb 1002.3146
## 3   Mar  996.0883
## 4   Apr  994.1707
## 5   May 1000.4652
## 6   Jun 1002.8020
## 7   Jul  996.0357
## 8   Aug 1001.6754
## 9   Sep  997.6772

或使用 data.table

library(data.table)
DT <- data.table(dat)
DT[, month := months(as.chron(dates))][,list(sales =  mean(sales)),by = month]
##    month     sales
## 1:   Jan 1004.5404
## 2:   Feb 1002.3146
## 3:   Mar  996.0883
## 4:   Apr  994.1707
## 5:   May 1000.4652
## 6:   Jun 1002.8020
## 7:   Jul  996.0357
## 8:   Aug 1001.6754
## 9:   Sep  997.6772

马修·道尔的评论

:=不需要,iiuc,直接by接受表达式:

DT[, list(sales=mean(sales)), by=months(as.chron(dates))]
##    months     sales
## 1:    Jan 1004.5404
## 2:    Feb 1002.3146
## 3:    Mar  996.0883
## 4:    Apr  994.1707
## 5:    May 1000.4652
## 6:    Jun 1002.8020
## 7:    Jul  996.0357
## 8:    Aug 1001.6754
## 9:    Sep  997.6772
于 2012-07-06T02:06:59.387 回答