1)使用最终用途注释中定义的数据,rollapply
如下所示。 nms
是要执行滚动窗口计算的列的名称,或者它可以指定为仅列索引(即nms <- 4:5
)。 Sum
就像 sum 一样,除了它会返回 NA,而不是 0,如果给定一个完全是 NA 的系列,否则它会执行sum(X, na.rm = TRUE)
. 请注意,添加的 NA 值roll
是为了使系列不短于窗口宽度。
library(data.table)
library(zoo)
k <- 2 # prior two months
Sum <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
roll <- function(x) rollapply(c(x, rep(NA, k)), list(1:k), Sum)
nms <- names(mytable)[4:5]
mytable[, (nms) := lapply(.SD, roll), .SDcols = nms, by = "Company"]
给予:
> mytable
Month Year Company ProducedCereals CommercialsShown
1: 6 2016 Kellog 15 19
2: 5 2016 Kellog 12 4
3: 4 2016 Kellog NA NA
4: 6 2016 General Mills 7 19
5: 5 2016 General Mills NA NA
1a)在评论中提到了缺少行的情况,并且仅使用当前行之前的最近两个日历月,因此任何总数都可能使用少于 2 行。
在这种情况下,先按 Company 的顺序对数据框进行排序,然后按升序对日期进行排序,这意味着我们想要右对齐而不是左对齐rollapply
。
我们传递一个带有 yearmon 索引的 zoo 对象,rollapply
以便我们有一个时间索引,Sum
可以检查将输入子集化到所需窗口。我们使用大小为 3 的窗口,并且只对时间在指定范围内的窗口中的值求和。我们将指定coredata = FALSE
torollapply
以便将数据和索引传递给rollapply
函数,而不仅仅是数据。
k <- 2 # prior 2 months
# inputs zoo object x, subsets it to specified window and sums
Sum2 <- function(x) {
w <- window(x, start = end(x) - k/12, end = end(x) - 1/12)
if (length(w) == 0 || all(is.na(w))) NA_real_ else sum(w, na.rm = TRUE)
}
nms <- names(mytable)[4:5]
setkey(mytable, Company, Year, Month) # sort
# create zoo object from arguments and run rollapplyr using Sum2
roll2 <- function(x, year, month) {
z <- zoo(x, as.yearmon(year + (month - 1)/12))
coredata(rollapplyr(z, k+1, Sum2, coredata = FALSE, partial = TRUE))
}
mytable[, (nms) := lapply(.SD, roll2, Year, Month), .SDcols = nms, by = "Company"]
给予:
> mytable
Month Year Company ProducedCereals CommercialsShown
1: 5 2016 General Mills NA NA
2: 6 2016 General Mills 7 19
3: 4 2016 Kellog NA NA
4: 5 2016 Kellog 12 4
5: 6 2016 Kellog 15
1b)另一种缺失行的方法是将数据转换为长格式,然后转换为矩形格式,用 NA 填充缺失的单元格。只要每家公司都没有缺少相同的月份和年份,这将起作用。
k <- 2 # sum over k prior months
m <- melt(mytable, id = 1:3)
dd <- as.data.frame.table(tapply(m$value, m[, 1:4, with = FALSE], c),
responseName = "value")
Sum1 <- function(x) {
x <- head(x, -1)
if (length(x) == 0 || all(is.na(x))) NA_real_ else sum(x, na.rm = TRUE)
}
setDT(dd)[, value := rollapplyr(value, k+1, Sum1, partial = TRUE),
by = .(Company, variable)]
dc <- as.data.table(dcast(... ~ variable, data = dd, value = "value"))
setkey(dc, Company, Year, Month)
dc
给予:
Month Year Company ProducedCereals CommercialsShown
1: 4 2016 General Mills NA NA
2: 5 2016 General Mills NA NA
3: 6 2016 General Mills 7 19
4: 4 2016 Kellog NA NA
5: 5 2016 Kellog 12 4
6: 6 2016 Kellog 15 19
2)另一种可能性是转换为由公司拆分mytable
的动物园对象,然后在其上使用。再次如最后的注释所示。 来自(1)。z
mytable
rollapply
mytable
Sum
k <- 2 # prior 2 months
ym <- function(m, y) as.yearmon(paste(m, y), format = "%m %Y")
z <- read.zoo(mytable, index = 1:2, split = k+1, FUN = ym)
Sum <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
rollapply(z, list(-1:-k), Sum, partial = TRUE, fill = NA)
给予:
ProducedCereals.General Mills CommercialsShown.General Mills
Apr 2016 NA NA
May 2016 NA NA
Jun 2016 7 19
ProducedCereals.Kellog CommercialsShown.Kellog
Apr 2016 NA NA
May 2016 12 4
Jun 2016 15 19
注意:问题中的代码不会生成问题中显示的数据,因此我们将其用于 data.table mytable
:
library(data.table)
mytable <-
structure(list(Month = c(6, 5, 4, 6, 5), Year = c(2016, 2016,
2016, 2016, 2016), Company = c("Kellog", "Kellog", "Kellog",
"General Mills", "General Mills"), ProducedCereals = c(6, 3,
12, 5, 7), CommercialsShown = c(12, 15, 4, 20, 19)), .Names = c("Month",
"Year", "Company", "ProducedCereals", "CommercialsShown"), row.names = c(NA,
-5L), class = "data.frame")
mytable <- as.data.table(mytable)