1

我正在尝试将数据拆分为 5 秒的时间间隔,并使用 dplyr 对它们进行分组。
以下是我的原始数据 - 我在单独的列中有日期和时间,后来我使用 Posixct 组合

structure(list(Date = c("10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013"), Time = c("20:06:57", "20:07:13", "20:07:25", "20:07:30", "20:08:16", "20:08:17", "20:08:26", "20:09:05", "20:09:06", "20:09:07", "20:09:37", "20:09:38", "20:09:55", "20:12:34", "20:14:15"), ID = c("M1", "M1", "M1", "M3", "M1", "M1", "M8", "M9", "M9", "M9", "M1", "M1", "M1", "M5", "M1")), .Names = c("Date", "Time", "ID"), class = "data.frame", row.names = c(NA, -15L))

在下面附上我的代码

data$datetime <- as.POSIXct(paste(data$Date, data$Time), format="%m/%d/%Y %H:%M:%S") 
data_order <- data %>%  arrange(datetime,ID)      
data_order$group <-  data_order  %>% group_by(by5sec=cut(datetime, breaks= "5 secs",right =T),ID) %>% group_indices() 

虽然有些观察结果是正确的,但有些是错误的。我尝试了 2 个版本 - 删除“right=T”并保留它,我得到了不同的组,但两个版本都有错误。我也尝试过使用 as.numeric,as.posixct 等等,但都是徒劳的

附加两个版本的输出。红色的被错误地编码为 2 个不同的组。

****版本 1 "right = T" 用于剪切****

在此处输入图像描述

****第 2 版“右 = F” 用于剪切****

在此处输入图像描述

有人可以帮忙解决这个问题吗,我已经花了很长时间了,鉴于我对 R 的了解,这简直是一场追逐。我想要的只是相同 ID 的 5 秒休息时间(组应该更改为新 ID)。

期望的输出

在此处输入图像描述

4

1 回答 1

2

我对您显示的输出图像并不完全清楚。根据您的问题描述,这样的事情怎么样?

library(tidyverse);
df %>%
    unite(datetime, 1:2, sep = " ", remove = FALSE) %>%
    mutate(
        datetime = as.POSIXct(datetime, format = "%m/%d/%Y %H:%M:%S"),
        datetime.by5sec = as.numeric(cut(datetime, "sec")) %/% 5 + 1);
#        datetime       Date     Time ID datetime.by5sec
#1  2013-10-30 20:06:57 10/30/2013 20:06:57 M1               1
#2  2013-10-30 20:07:13 10/30/2013 20:07:13 M1               4
#3  2013-10-30 20:07:25 10/30/2013 20:07:25 M1               6
#4  2013-10-30 20:07:30 10/30/2013 20:07:30 M3               7
#5  2013-10-30 20:08:16 10/30/2013 20:08:16 M1              17
#6  2013-10-30 20:08:17 10/30/2013 20:08:17 M1              17
#7  2013-10-30 20:08:26 10/30/2013 20:08:26 M8              19
#8  2013-10-30 20:09:05 10/30/2013 20:09:05 M9              26
#9  2013-10-30 20:09:06 10/30/2013 20:09:06 M9              27
#10 2013-10-30 20:09:07 10/30/2013 20:09:07 M9              27
#11 2013-10-30 20:09:37 10/30/2013 20:09:37 M1              33
#12 2013-10-30 20:09:38 10/30/2013 20:09:38 M1              33
#13 2013-10-30 20:09:55 10/30/2013 20:09:55 M1              36
#14 2013-10-30 20:12:34 10/30/2013 20:12:34 M5              68
#15 2013-10-30 20:14:15 10/30/2013 20:14:15 M1              88

解释:datetime.by5sec给出 5 秒的 bin 索引datetime。所以第一个条目位于 bin 1 中。第二个条目在第 4 个 5 秒的 bin 内,即在第一个条目的 20 秒内,依此类推。在这里我使用了整数除法%/% 5,因为cut.POSIXct只允许你按秒为间隔。


更新

以下重现了您的预期输出:

df %>%
    unite(datetime, 1:2, sep = " ", remove = FALSE) %>%
    group_by(ID) %>%
    mutate(
        datetime = as.POSIXct(datetime, format = "%m/%d/%Y %H:%M:%S"),
        difftime = difftime(datetime, lag(datetime, default = 0))) %>%
    ungroup() %>%
    mutate(
        group = cumsum(abs(difftime) >= 5)) %>%
    select(Date, Time, ID, datetime, group);
## A tibble: 15 x 5
#   Date       Time     ID    datetime            group
#   <chr>      <chr>    <chr> <dttm>              <int>
# 1 10/30/2013 20:06:57 M1    2013-10-30 20:06:57     1
# 2 10/30/2013 20:07:13 M1    2013-10-30 20:07:13     2
# 3 10/30/2013 20:07:25 M1    2013-10-30 20:07:25     3
# 4 10/30/2013 20:07:30 M3    2013-10-30 20:07:30     4
# 5 10/30/2013 20:08:16 M1    2013-10-30 20:08:16     5
# 6 10/30/2013 20:08:17 M1    2013-10-30 20:08:17     5
# 7 10/30/2013 20:08:26 M8    2013-10-30 20:08:26     6
# 8 10/30/2013 20:09:05 M9    2013-10-30 20:09:05     7
# 9 10/30/2013 20:09:06 M9    2013-10-30 20:09:06     7
#10 10/30/2013 20:09:07 M9    2013-10-30 20:09:07     7
#11 10/30/2013 20:09:37 M1    2013-10-30 20:09:37     8
#12 10/30/2013 20:09:38 M1    2013-10-30 20:09:38     8
#13 10/30/2013 20:09:55 M1    2013-10-30 20:09:55     9
#14 10/30/2013 20:12:34 M5    2013-10-30 20:12:34    10
#15 10/30/2013 20:14:15 M1    2013-10-30 20:14:15    11

说明:计算两个连续datetime条目之间的时间差,按ID;分组。group是所有时间差的累积和>=5

于 2018-03-09T12:35:34.417 回答