我有一个 CSV 文件,其中包含此时发生的时间戳和某些事件类型。我想要的是每隔 6 分钟计算某些事件类型的发生次数。
输入数据如下所示:
date,type
"Sep 22, 2011 12:54:53.081240000","2"
"Sep 22, 2011 12:54:53.083493000","2"
"Sep 22, 2011 12:54:53.084025000","2"
"Sep 22, 2011 12:54:53.086493000","2"
我用这段代码加载和处理数据:
> raw_data <- read.csv('input.csv')
> cured_dates <- c(strptime(raw_data$date, '%b %d, %Y %H:%M:%S', tz="CEST"))
> cured_data <- data.frame(cured_dates, c(raw_data$type))
> colnames(cured_data) <- c('date', 'type')
固化后的数据如下所示:
> head(cured_data)
date type
1 2011-09-22 14:54:53 2
2 2011-09-22 14:54:53 2
3 2011-09-22 14:54:53 2
4 2011-09-22 14:54:53 2
5 2011-09-22 14:54:53 1
6 2011-09-22 14:54:53 1
我为 xts 和 zoo 阅读了很多样本,但不知何故我无法掌握它。输出数据应类似于:
date type count
2011-09-22 14:54:00 CEST 1 11
2011-09-22 14:54:00 CEST 2 19
2011-09-22 15:00:00 CEST 1 9
2011-09-22 15:00:00 CEST 2 12
2011-09-22 15:06:00 CEST 1 23
2011-09-22 15:06:00 CEST 2 18
Zoo 的聚合函数看起来很有希望,我发现了这个代码片段:
# aggregate POSIXct seconds data every 10 minutes
tt <- seq(10, 2000, 10)
x <- zoo(tt, structure(tt, class = c("POSIXt", "POSIXct")))
aggregate(x, time(x) - as.numeric(time(x)) %% 600, mean)
现在我只是想知道如何将其应用于我的用例。
我尝试过天真:
> zoo_data <- zoo(cured_data$type, structure(cured_data$time, class = c("POSIXt", "POSIXct")))
> aggr_data = aggregate(zoo_data$type, time(zoo_data$time), - as.numeric(time(zoo_data$time)) %% 360, count)
Error in `$.zoo`(zoo_data, type) : not possible for univariate zoo series
我必须承认我对 R 不是很有信心,但我会尝试。:-)
我有点迷路了。谁能指出我正确的方向?
非常感谢!干杯,亚历克斯。
这里是我数据的一小部分的 dput 输出。数据本身大约有 8000 万行。
structure(list(date = structure(c(1316697885, 1316697885, 1316697885,
1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885,
1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885,
1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885,
1316697885, 1316697885), class = c("POSIXct", "POSIXt"), tzone = ""),
type = c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L)), .Names = c("date",
"type"), row.names = c(NA, -23L), class = "data.frame")