r - 如何通过在 R 中的时间序列中使用第一个和最后一个时间戳来定义遇到周期

Question

我使用电子标签的鱼。下面是我的遥测数据片段（数据帧“d”）。每个时间戳代表对唯一鱼的检测。

TagID          Detection              Location      RiverKm
163            02/23/2012 03:17:44    Alcatraz_E     4.414
163            02/23/2012 03:56:25    Alcatraz_E     4.414
163            04/14/2012 15:10:20    Alcatraz_E     4.414
163            04/14/2012 15:12:11    Alcatraz_N     4.414
163            03/11/2012 08:59:48    Alcatraz_N     4.414
163            03/11/2012 09:02:15    Alcatraz_N     4.414
163            03/11/2012 09:04:05    Alcatraz_N     4.414
163            03/11/2012 09:04:06    Alcatraz_N     4.414
163            03/11/2012 09:06:09    Alcatraz_N     4.414
163            03/11/2012 09:06:11    Alcatraz_E     4.414

有许多不同的 TagID（个体鱼）。我想通过识别开始时间（“到达”）和结束时间（“离开”）将检测分类为每条鱼的遭遇时段，临界值为 1 小时。例如，对于上述鱼（TagID 163），输出将是：

TagID       arrival                  departure            Location        RiverKm
163        02/23/2012 03:17:44    02/23/2012 03:56:25     Alcatraz_E       4.414 
163        04/14/2012 15:10:2     04/14/2012 15:12:11     Alcatraz_N       4.414
163        03/11/2012 08:59:48    03/11/2012 09:06:11     Alcatraz_E       4.414

我想创建一个执行以下操作的循环（或任何其他代码结构）：

for j in 1:length(unique(d$TagID))

确定第一次检测的时间（“t1”）
如果时间序列中该标签的下一次检测（“t2”）距离 t1 不到一小时，则跳过它并继续下一次检测；否则，将 t1 放在“到达”向量中，将 t2 放在“离开向量”中。
在为每个 TagID 分类每个到达和离开时间戳时停止。

我不知道如何以最有效的方式做到这一点，非常感谢您的帮助。

谢谢！

score 2 · Accepted Answer

您应该首先按日期对数据进行排序。这就是为什么您应该将您的检测变量转换为有效的 r 日期时间类型：POSIXct。一旦您的数据被排序，使用diff，cumsum您可以创建一个用于跳跃检测的分组变量：这里至少一个小时（60 分钟）后发生跳跃。我data.table在分组操作中使用了糖语法，但如果您没有大量数据，则不需要特别注意。

这是我的完整代码：

library(data.table)
## data coerecion
d$Detection <- 
  as.POSIXct(strptime(d$Detection,'%m/%d/%Y %H:%M:%S'))
## sort using Detecetion
d <- d[order(d$Detection),]
# id is incrementing variable that detects a jump of an hour
d$id <- cumsum(c(F,round(diff(d$Detection)/60) >60))
## you don't mention how to choose location,Riverkm so I take by default the first ones
setDT(d)[,list(start   =Detection[1],
               end     =Detection[length(Detection)],
               Location=Location[1],
               RiverKm =RiverKm[1]),
         "TagID,id"]

#    TagID id               start                 end   Location RiverKm
# 1:   163  0 2012-02-23 03:17:44 2012-02-23 03:56:25 Alcatraz_E   4.414
# 2:   163  1 2012-03-11 08:59:48 2012-03-11 09:06:11 Alcatraz_N   4.414
# 3:   163  2 2012-04-14 15:10:20 2012-04-14 15:12:11 Alcatraz_E   4.414

score 1 · Accepted Answer

这是与 dplyr （版本 0.3）类似的方法。我用 0.3 的新功能编辑了我的代码。

# If you need to download the latest development version
if (packageVersion("devtools") < 1.6) {
install.packages("devtools")
}
devtools::install_github("hadley/lazyeval")
devtools::install_github("hadley/dplyr")

library(dplyr)

foo <- data.frame(
    TagID = rep(c(163:164), each = 10),
    Detection = rep(c("02/23/2012 03:17:44", "02/23/2012 03:56:25", "04/14/2012 15:10:20",
                  "04/14/2012 15:12:11", "03/11/2012 08:59:48", "03/11/2012 09:02:15",
                  "03/11/2012 09:04:05", "03/11/2012 09:04:06", "03/11/2012 09:06:09",
                  "03/11/2012 09:06:11"), times = 2),
    Location = rep(c("Alcatraz_E", "Alcatraz_E", "Alcatraz_E", "Alcatraz_N", "Alcatraz_N",
                 "Alcatraz_N", "Alcatraz_N", "Alcatraz_N", "Alcatraz_N", "Alcatraz_E"),times = 2),
    RiverKm = 4.414,
    stringsAsFactors = FALSE)

foo$Detection <- as.POSIXct(strptime(foo$Detection,'%m/%d/%Y %H:%M:%S'))

foo %>%
    arrange(TagID, Detection) %>%
    group_by(TagID, id = cumsum(!duplicated(TagID) | c(F,round(diff(Detection)/60) > 60))) %>%
    slice(c(1,length(Detection))) %>%
    mutate(Departure = Detection[2]) %>%
    slice(1) %>%
    ungroup 


#  TagID           Detection   Location RiverKm id           Departure
#1   163 2012-02-23 03:17:44 Alcatraz_E   4.414  0 2012-02-23 03:56:25
#2   163 2012-03-11 08:59:48 Alcatraz_N   4.414  1 2012-03-11 09:06:11
#3   163 2012-04-14 15:10:20 Alcatraz_E   4.414  2 2012-04-14 15:12:11
#4   164 2012-02-23 03:17:44 Alcatraz_E   4.414  0 2012-02-23 03:56:25
#5   164 2012-03-11 08:59:48 Alcatraz_N   4.414  1 2012-03-11 09:06:11
#6   164 2012-04-14 15:10:20 Alcatraz_E   4.414  2 2012-04-14 15:12:11

r - 如何通过在 R 中的时间序列中使用第一个和最后一个时间戳来定义遇到周期

2 回答 2

Related

Reference