这是您拥有的数据的示例,如加载到 R 中、聚合等...
首先,一些要写入文件的虚拟数据:
stime <- as.POSIXct("2011-01-01-00:00:00", format = "%Y-%d-%m-%H:%M:%S")
## dummy data
dat <- data.frame(Timestamp = seq(from = stime, by = 5, length = 2000000),
DD1 = sample(1:1000, replace = TRUE),
DD2 = sample(1:1000, replace = TRUE),
DD3 = sample(1:1000, replace = TRUE),
DD4 = sample(1:1000, replace = TRUE))
## write it out
write.csv(dat, file = "timestamp_data.txt", row.names = FALSE)
然后我们可以定时读取 200 万行。为了加快速度,我们告诉 R 文件中列的类:这"POSIXct"
是 R 中存储您拥有的时间戳的一种方式。
## read it in:
system.time({
tsdat <- read.csv("timestamp_data.txt", header = TRUE,
colClasses = c("POSIXct",rep("integer", 4)))
})
其中,在我的普通笔记本电脑上以内部 unix 时间读取和格式化大约需要 13 秒。
user system elapsed
13.698 5.827 19.643
聚合可以通过多种方式完成,一种是使用aggregate()
. 说聚合到小时平均值/平均值:
## Generate some indexes that we'll use the aggregate over
tsdat <- transform(tsdat,
hours = factor(strftime(tsdat$Timestamp, format = "%H")),
jday = factor(strftime(tsdat$Timestamp, format = "%j")))
## compute the mean of the 4 variables for each minute
out <- aggregate(cbind(Timestamp, DD1, DD2, DD3, DD4) ~ hours + jday,
data = tsdat, FUN = mean)
## convert average Timestamp to a POSIX time
out <- transform(out,
Timestamp = as.POSIXct(Timestamp,
origin = ISOdatetime(1970,1,1,0,0,0)))
那(行创建out
)在我的笔记本电脑上需要大约 16 秒,并给出以下输出:
> head(out)
hours jday Timestamp DD1 DD2 DD3 DD4
1 00 001 2010-12-31 23:29:57 500.2125 491.4333 510.7181 500.4833
2 01 001 2011-01-01 00:29:57 516.0472 506.1264 519.0931 494.2847
3 02 001 2011-01-01 01:29:57 507.5653 499.4972 498.9653 509.1389
4 03 001 2011-01-01 02:29:57 520.4111 500.8708 514.1514 491.0236
5 04 001 2011-01-01 03:29:57 498.3222 500.9139 513.3194 502.6514
6 05 001 2011-01-01 04:29:57 515.5792 497.1194 510.2431 496.8056
plot()
使用以下函数可以实现简单的绘图:
plot(DD1 ~ Timestamp, data = out, type = "l")
我们可以通过以下方式覆盖更多变量,例如:
ylim <- with(out, range(DD1, DD2))
plot(DD1 ~ Timestamp, data = out, type = "l", ylim = ylim)
lines(DD2 ~ Timestamp, data = out, type = "l", col = "red")
或通过多个面板:
layout(1:2)
plot(DD1 ~ Timestamp, data = out, type = "l", col = "blue")
plot(DD2 ~ Timestamp, data = out, type = "l", col = "red")
layout(1)
这一切都是通过基本 R 功能完成的。其他人已经展示了附加包如何使处理日期更容易。