r - 每两分钟将事件列表转换为一系列事件数

Question

这里和这里有两个密切相关的帖子。我无法将其中任何一个翻译成我的确切情况。

这是一个时间向量：

start.time = as.POSIXct("2013-06-20 01:00:00")
x = start.time + runif(5, min = 0, max = 8*60)
x = x[order(x)]
x
# [1] "2013-06-20 01:00:30 EDT" "2013-06-20 01:00:57 EDT"
# [3] "2013-06-20 01:01:43 EDT" "2013-06-20 01:04:01 EDT"
# [5] "2013-06-20 01:04:10 EDT"

接下来，这是一个两分钟标记的向量：

y = seq(as.POSIXct("2013-06-20 01:00:00"), as.POSIXct("2013-06-20 01:06:00"), 60*2)
y
# [1] "2013-06-20 01:00:00 EDT" "2013-06-20 01:02:00 EDT"
# [3] "2013-06-20 01:04:00 EDT" "2013-06-20 01:06:00 EDT"

我想要一种快速、灵活、可扩展的方法来生成x落在的每个元素右侧的两分钟容器中的元素计数y，如下所示：

                    y count.x
1 2013-06-20 01:00:00       3
2 2013-06-20 01:02:00       0
3 2013-06-20 01:04:00       2
4 2013-06-20 01:06:00       0

score 3 · Accepted Answer

怎么样

as.data.frame(table(cut(x, breaks=c(y, Inf))))

                 Var1 Freq
1 2013-06-20 01:00:00    3
2 2013-06-20 01:02:00    0
3 2013-06-20 01:04:00    2
4 2013-06-20 01:06:00    0

score 0 · Accepted Answer

这是一个解决问题的函数，运行速度比table(cut(...))：

get.bin.counts = function(x, name.x = "x", start.pt, end.pt, bin.width){
  br.pts = seq(start.pt, end.pt, bin.width)
  x = x[(x >= start.pt)&(x <= end.pt)]
  counts = hist(x, breaks = br.pts, plot = FALSE)$counts
  dfm = data.frame(br.pts[-length(br.pts)], counts)
  names(dfm) = c(name.x, "freq")
  return(dfm)
}

这里的关键线在中间—— counts = hist(...。hist将绘图选项设置为的函数起到FALSE了至关重要的作用。

为了测试这个函数的速度性能，我运行它如下：

# First define x, a large vector of times:    
start.time = as.POSIXct("2012-11-01 00:00:00")
x = start.time + runif(50000, min = 0, max = 365*24*3600)
x = x[order(x)]
# Apply the function, keeping track of running time:
t1 = Sys.time()
dfm = get.bin.counts(x, name.x = "time", 
                     start.pt = as.POSIXct("2012-11-01 00:00:00"),
                     end.pt = as.POSIXct("2013-07-01 00:00:00"), 
                     bin.width = 60)
as.numeric(Sys.time()-t1) #prints elapsed time

在此示例中，我的函数运行速度比table(cut(...))10 倍多一点。归功于cut 帮助页面，其中指出：“而不是table(cut(x, br))，hist(x, br, plot = FALSE)效率更高，内存消耗更少。”

r - 每两分钟将事件列表转换为一系列事件数

2 回答 2

Related

Reference