1

可能重复:
关于使用 plyr 计算组间同比变化的初学者提示

计算跨多个变量组(即地区和食品)的现有数据框变量(即销售额)的同比差异(新变量)的好方法是什么?

下面是一个数据框结构的例子:

Date              Region    Type    Sales

1/1/2001    East    Food    120
1/1/2001    West    Housing 130
1/1/2001    North   Food    130
1/2/2001    East    Food    133
1/3/2001    West    Housing 140
1/4/2001    North   Food    150
….
….
1/29/2013   East    Food    125
1/29/2013   West    Housing 137
1/29/2013   North   Food    1350

此外,除了区分数据之外,我还想计算一个尾随(比如 7 天)移动平均线。

任何指导将不胜感激。

4

1 回答 1

3

这里有一些东西可以帮助您入门。data.table对这类事情来说是一个很棒的包,因为它为这类事情提供了简洁且易于使用的语法(一旦你超过了学习曲线)。

library(data.table)

创建一个可重现的示例

set.seed(128)
regions = c("East", "West", "North", "South")
types = c("Food", "Housing")
dates <- seq(as.Date('2009-01-01'), as.Date('2011-12-31'), by = 1)
n <- length(dates)
dt <- data.table(Date = dates, 
                 Region = sample(regions, n, replace = TRUE),
                 Type = sample(types, n, replace = TRUE),
                 Sales = round(rnorm(n, mean = 100, sd = 10)))

添加年份列

dt[, Year := year(Date)]

> dt
        Date Region    Type Sales Year
1: 2009-01-01   West    Food   119 2009
2: 2009-01-02  North Housing   102 2009
3: 2009-01-03  North Housing   102 2009
4: 2009-01-04  North    Food   101 2009
5: 2009-01-05   West    Food   101 2009
---                                     
1091: 2011-12-27   East Housing   122 2011
1092: 2011-12-28   East Housing    88 2011
1093: 2011-12-29  North    Food   115 2011
1094: 2011-12-30   West Housing    96 2011
1095: 2011-12-31   East    Food   101 2011

按年份计算汇总

summary <- dt[, list(Sales = sum(Sales)), by = 'Year,Region,Type']
setkey(summary, 'Year')

> head(summary)
Year Region    Type Sales
1: 2009   West    Food  4791
2: 2009  North Housing  3517
3: 2009  North    Food  6774
4: 2009  South Housing  4380
5: 2009   East    Food  4144
6: 2009   West Housing  4275

为每个地区/产品组合创建同比差异的功能。

YoYdiff <- function(dt) {
  # Calculate year-on-year difference for Sales column
  data.table(Sales.Diff = diff(dt$Sales), Year = dt$Year[-1])
}

按列计算同比差异。这适用于我的示例,因为 setkey(dt, Year) 按年份对数据表进行排序,但如果您的示例遗漏了某些产品/地区的年份,则必须更加小心。

> summary[, YoYdiff(.SD), by = 'Region,Type']
    Region    Type Sales.Diff Year
 1:   West    Food       -412 2010
 2:   West    Food        121 2011
 3:  North Housing       1907 2010
 4:  North Housing      -1457 2011
 5:  North    Food      -3087 2010
 6:  North    Food        369 2011
 7:  South Housing       -539 2010
 8:  South Housing        575 2011
 9:   East    Food       1264 2010
10:   East    Food      -1732 2011
11:   West Housing        298 2010
12:   West Housing       -410 2011
13:  South    Food       -889 2010
14:  South    Food       1045 2011
15:   East Housing       1146 2010
16:   East Housing       1169 2011
于 2013-01-29T22:57:47.340 回答