2

I have data on 'Start' and 'End' time for different jobs, grouped by 'owner':

Data <- data.frame(
  job = c(1, 2, 3, 4, 5),
  owner = c("name1", "name2", "name1", "name1", "name2"),
  Start = as.POSIXct(c("2015-01-01 15:00:00", "2015-01-01 15:01:00", "2015-01-01 15:13:00", "2015-01-01 15:20:00", "2015-01-01 15:39:02"), format="%Y-%m-%d %H:%M:%S"),
  End =   as.POSIXct(c("2015-01-01 15:11:11", "2015-01-01 15:17:21", "2015-01-01 15:17:00", "2015-01-01 15:31:21", "2015-01-01 15:40:11"), format="%Y-%m-%d %H:%M:%S")
)

For each owner, I want to calculate the idle time between the jobs for each owner, i.e. the difference between the 'End' time of one job and the 'Start' time of the next job.

How do I use difftime() to calculate this time difference between specific rows and times in different columns?

The result should look something like this:

job, owner, idletime
1, name1, NA
2, name2, NA
3, name1, 1.816667  # End of row 1 minus Start of row 3
4, name1, 3.0       # End of row 3 minus Start of row 4
...
4

3 回答 3

5

这是一个可能的解决方案data.table

library(data.table) # v 1.9.5+
setDT(Data)[, idletime := difftime(Start, shift(End), units = "mins"), by = owner]
#    job owner               Start                 End       idletime
# 1:   1 name1 2015-01-01 15:00:00 2015-01-01 15:11:11        NA mins
# 2:   2 name2 2015-01-01 15:01:00 2015-01-01 15:17:21        NA mins
# 3:   3 name1 2015-01-01 15:13:00 2015-01-01 15:17:00  1.816667 mins
# 4:   4 name1 2015-01-01 15:20:00 2015-01-01 15:31:21  3.000000 mins
# 5:   5 name2 2015-01-01 15:39:02 2015-01-01 15:40:11 21.683333 mins

或使用dplyr

library(dplyr)
Data %>%
  group_by(owner) %>%
  mutate(idletime = difftime(Start, lag(End), units = "mins"))

# Source: local data frame [5 x 5]
# Groups: owner
# 
#   job owner               Start                 End       idletime
# 1   1 name1 2015-01-01 15:00:00 2015-01-01 15:11:11        NA mins
# 2   2 name2 2015-01-01 15:01:00 2015-01-01 15:17:21        NA mins
# 3   3 name1 2015-01-01 15:13:00 2015-01-01 15:17:00  1.816667 mins
# 4   4 name1 2015-01-01 15:20:00 2015-01-01 15:31:21  3.000000 mins
# 5   5 name2 2015-01-01 15:39:02 2015-01-01 15:40:11 21.683333 mins
于 2015-08-25T09:09:50.507 回答
3

如果我们使用base R,ave将是一种选择。我们得到lag按“所有者”分组的“结束” ave,使用它作为第二个参数difftime来创建“idtime”。

Data$idtime <- with(Data, difftime(Start, ave(End, owner,FUN=lag), units='mins'))

Data
#  job owner               Start                 End         idtime
#1   1 name1 2015-01-01 15:00:00 2015-01-01 15:11:11        NA mins
#2   2 name2 2015-01-01 15:01:00 2015-01-01 15:17:21        NA mins
#3   3 name1 2015-01-01 15:13:00 2015-01-01 15:17:00  1.816667 mins
#4   4 name1 2015-01-01 15:20:00 2015-01-01 15:31:21  3.000000 mins
#5   5 name2 2015-01-01 15:39:02 2015-01-01 15:40:11 21.683333 mins

注意:我将列名命名为“idtime”以将代码保留在一行中:-)

于 2015-08-25T10:52:16.077 回答
0
library(dplyr)


Data <- data.frame(
  job = c(1, 2, 3, 4, 5),
  owner = c("name1", "name2", "name1", "name1", "name2"),
  Start = as.POSIXct(c("2015-01-01 15:00:00", "2015-01-01 15:01:00", "2015-01-01 15:13:00", "2015-01-01 15:20:00", "2015-01-01 15:39:02"), format="%Y-%m-%d %H:%M:%S"),
  End =   as.POSIXct(c("2015-01-01 15:11:11", "2015-01-01 15:17:21", "2015-01-01 15:17:00", "2015-01-01 15:31:21", "2015-01-01 15:40:11"), format="%Y-%m-%d %H:%M:%S")
)


Data %>% 
  group_by(owner) %>% 
  arrange(Start) %>% 
  mutate(lagEnd = lag(End),
         idletime = difftime(Start,lagEnd, units="mins")) %>%
  ungroup %>%
  arrange(job) %>%
  select(job,owner,idletime)

#   job owner        idletime
# 1   1 name1         NA mins
# 2   2 name2         NA mins
# 3   3 name1   1.816667 mins
# 4   4 name1   3.000000 mins
# 5   5 name2  21.683333 mins
于 2015-08-25T08:56:53.680 回答