r - 聚合时间序列时避免 for 循环

Question

我明白为什么矢量化函数比 for 循环更好。

但是有一些问题我看不到矢量化函数式编程解决方案。其中之一是将月度数据相加以获得季度数据。替换此代码的任何建议...

month <- 1:100
A422072L <- c(rep(NA, 4), rnorm(96, 100, 5) ) + 2 * month
A422070J <- c(NA, NA, rnorm(96, 100, 5), NA, NA) + 2 * month
Au.approvals <- data.frame(month=month, A422072L=A422072L, A422070J=A422070J)

Au.approvals$trend.sum.A422072L.qtr <- NA
Au.approvals$sa.sum.A422070J.qtr <- NA
for(i in seq_len(nrow(Au.approvals)))
{
    if(i < 3) next
    if(all(!is.na(Au.approvals$A422072L[(i-2):i])))
        Au.approvals$trend.sum.A422072L.qtr[i] <- sum(Au.approvals$A422072L[(i-2):i])
    if(all(!is.na(Au.approvals$A422070J[(i-2):i])))
        Au.approvals$sa.sum.A422070J.qtr[i]    <- sum(Au.approvals$A422070J[(i-2):i])
}

print(Au.approvals)

现在有足够的数据作为示例运行。

score 4 · Accepted Answer

让我们创建一些虚假的时间序列：

time_dat = data.frame(t = 1:100, value = runif(100))

要获得滚动总和，请查看rollapplyzoo 包：

require(zoo)
time_dat = transform(time_dat, 
                     roll_value = rollapply(value, 10, sum, fill = TRUE))

在这里，我假设较粗的分辨率（每季度）比较细的分辨率粗 10 倍。

非滚动平均值的原始答案：

我喜欢使用plyr包中的函数，但是ave、aggregate和data.table也是不错的选择。对于大型数据集，data.table速度非常快。但要回到一些plyr魔法：

首先创建一个附加列，指定更粗略的时间频率，即您在哪个季度观察：

time_dat[["coarse_t"]] = rep(1:10, each = 10)
> head(time_dat)
  t     value coarse_t
1 1 0.9045097        1
2 2 0.4174182        1
3 3 0.5638139        1
4 4 0.8228698        1
5 5 0.7059027        1
6 6 0.5285386        1

现在我们可以聚合time_dat更粗略的时间频率：

time_dat_coarse = ddply(time_dat, .(coarse_t), summarise, sum_value = sum(value))
> time_dat_coarse
   coarse_t sum_value
1         1  6.097348
2         2  4.834720
3         3  3.988809
4         4  4.170656
5         5  4.538269
6         6  6.198716
7         7  4.399282
8         8  5.507384
9         9  6.089072
10       10  4.663287

score 1 · Accepted Answer

Paul 的回答很棒，但我只想补充一点，chron 包有许多出色的日期/时间分类操作，可以与 plyr 配对进行聚合

library("chron") 
# chron uses chron-specific object representation. 
# If a different representation is needed, a conversion is necessary
# eg. if a$date is a chron date object, I would us as.POSIXct(a$date) to get a POSIXct representation

# create chron date objects and values
a<-data.frame(date=as.chron(Sys.Date() + 1:1000), value = 1:100*runif(100,0,1))

# cuts dates into 15 intervals
a$interval1<-cut(a$date,15)
# cuts dates into 10 number of intervals using a label you define
a$interval2<-cut(a$date,10,paste("group",1:10))
# cuts dates into weeks
a$weeks<-cut(a$date,"weeks",start.on.monday=FALSE)
# cuts dates into months
a$months<-cut(a$date,"months")
# cuts dates into years
a$years<-cut(a$date,"years")
# classifies day based on day of week
a$day_of_week<-day.of.week(a$date)

# creating a chron time object
b<-data.frame(day_time=as.chron(Sys.time()+1:1000*100), value = 1:100*runif(100,0,1))
# cuts times into days - note: uses first time period as the start
b$day<-cut(b$day_time,"days")
# truncates time to 5 minute interval
b$min_5<-trunc(b$day_time, "00:05:00")
# truncates time to 1 hour intervals
b$hour1<-trunc(b$day_time, "01:00:00")
# truncates datetime to 1 hour and 2 second intervals
b$days_3<-trunc(b$day_time, "01:00:02")

我经常使用 chron，因为它使时间聚合变得更加容易。

此外，zoo 和 xts 包还有更多功能，这些功能非常适合在一天的细节级别之后进行各种聚合。他们的文档非常庞大，可能很难找到您想要的东西，但几乎所有您想要的东西都在那里。一些亮点：

library("zoo")
library("xts")
?rollapply
?rollsum
?rollmean
?rollmedian
?rollmax
?yearmon
?yearqtr
?apply.daily
?apply.weekly
?apply.monthly
?apply.quarterly
?apply.yearly
?to.minutes
?to.minutes3
?to.minutes5
?to.minutes10
?to.minutes15
?to.minutes30
?to.hourly
?to.daily
?to.weekly
?to.monthly
?to.quarterly
?to.yearly
?to.period

r - 聚合时间序列时避免 for 循环

2 回答 2

Related

Reference