2

我有一个简单的数据框。

a <- c("06/12/2012 06:00","06/12/2012 06:05","06/12/2012 06:10","06/12/2012 06:15","06/12/2012 06:20","06/12/2012 06:25",
   "06/12/2012 06:30","06/12/2012 06:35","06/12/2012 06:40","06/12/2012 06:45","06/12/2012 06:50","06/12/2012 06:55",
   "06/12/2012 07:00","06/12/2012 07:05","06/12/2012 07:10","06/12/2012 07:15","06/12/2012 07:20","06/12/2012 07:25",
   "06/12/2012 07:30","06/12/2012 07:35","06/12/2012 07:40","06/12/2012 07:45","06/12/2012 07:50","06/12/2012 07:55",
   "06/12/2012 08:00")
a <- strptime(a, "%d/%m/%Y %H:%M")

b <-c("1","0","0","0","2","0","0","0","3","0","0","0","0","0","1","2","5","6","0","0","0","0","6","10","2")
df1 <- data.frame(a,b)

当有效数据不足时,我想使用 R 删除部分数据框。每 5 分钟记录一次数据。如果在“b”列中仅记录零时有 20 分钟或更长时间的连续数据,则可以从我的最终数据框中删除这些数据。

如果有人有任何想法可以帮助我,我将非常感激。

4

2 回答 2

3

另一个,仍在使用rle

is.zero <- df1$b == 0
is.zero.rle <- rle(is.zero)
df1[rep(is.zero.rle$lengths, is.zero.rle$lengths) * is.zero < 4, ]

如果我显示中间结果可能有助于理解:

rep(is.zero.rle$lengths, is.zero.rle$lengths) * is.zero
# [1] 0 3 3 3 0 3 3 3 0 5 5 5 5 5 0 0 0 0 4 4 4 4 0 0 0
于 2013-02-10T19:58:46.190 回答
2

一种使用的解决方案rle(正如本在评论中提到的那样)

# get rle
t <- rle(as.numeric(as.character(df1$b)))
# check for condition. NOTE: here I assume all are 5 minute intervals!!
# So, if rle length >= 4, then its >= 20 minute interval
p <- which(t$values == 0 & t$lengths >= 4)
w <- cumsum(t$lengths)
o <- unlist(lapply(p, function(x) {
    c((w[x-1]+1):w[x])
}))
df1[-o, ]

#                      a  b
# 1  2012-12-06 06:00:00  1
# 2  2012-12-06 06:05:00  0
# 3  2012-12-06 06:10:00  0
# 4  2012-12-06 06:15:00  0
# 5  2012-12-06 06:20:00  2
# 6  2012-12-06 06:25:00  0
# 7  2012-12-06 06:30:00  0
# 8  2012-12-06 06:35:00  0
# 9  2012-12-06 06:40:00  3
# 15 2012-12-06 07:10:00  1
# 16 2012-12-06 07:15:00  2
# 17 2012-12-06 07:20:00  5
# 18 2012-12-06 07:25:00  6
# 23 2012-12-06 07:50:00  6
# 24 2012-12-06 07:55:00 10
# 25 2012-12-06 08:00:00  2
于 2013-02-10T19:41:07.657 回答