1

我看过 plyr 但我想要达到的目标与通常的完全不同

Time               Criteria

17/05/2013 17:22   A
17/05/2013 17:23   A
17/05/2013 17:29   A
17/05/2013 17:22   B
17/05/2013 17:28   B
17/05/2013 17:29   B
25/05/2013 16:56   C
25/05/2013 16:56   C

我想按标准拆分这些数据。然后对于每个子集,遍历记录并决定是否保留该记录,如果每条记录距离最后一条记录不到 5 分钟。

期望的结果:

Time               Criteria  Keep

17/05/2013 17:22   A         T
17/05/2013 17:23   A         T
17/05/2013 17:29   A         F --> 29 is more than 5 mins from 23
17/05/2013 17:22   B         F --> Not keeping this because it is >5min from next record
17/05/2013 17:28   B         T 
17/05/2013 17:29   B         T
25/05/2013 16:56   C         T
25/05/2013 16:56   C         T

输入:

structure(list(Time = structure(c(1368782520, 1368782580, 1368782940, 
1368782520, 1368782880, 1368782940, 1369472160, 1369472160), class = c("POSIXct", 
"POSIXt"), tzone = "Singapore"), Criteria = structure(c(1L, 1L, 
1L, 2L, 2L, 2L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("Time", 
"Criteria"), row.names = c(NA, -8L), class = "data.frame")
4

1 回答 1

6

这有效:

ddply(dat, "Criteria", transform,
      Keep = c(FALSE, diff(Time) <= 5) |
             c(diff(Time) <= 5, FALSE))

#                  Time Criteria  Keep
# 1 2013-05-17 17:22:00        A  TRUE
# 2 2013-05-17 17:23:00        A  TRUE
# 3 2013-05-17 17:29:00        A FALSE
# 4 2013-05-17 17:22:00        B FALSE
# 5 2013-05-17 17:28:00        B  TRUE
# 6 2013-05-17 17:29:00        B  TRUE
# 7 2013-05-25 16:56:00        C  TRUE
# 8 2013-05-25 16:56:00        C  TRUE

我对 diff 日期不是很熟悉,所以你可能需要小心并找出是否有办法让它系统地返回以分钟为单位的时差(虽然在这个例子中就是这种情况。)

于 2013-06-20T02:36:19.203 回答