4

我正在尝试找到一个可以替换以下代码的矢量化过程(需要很长时间才能运行):

for (i in 2:nrow(z)) {
  if (z$customerID[i]==z$customerID[i-1]) 
     {z$timeDelta[i]<-(z$time[i]-z$time[i-1])} else {z$timeDelta[i]<- NA}
}

我尝试寻找不同的应用片段,但没有发现任何有用的东西。

以下是一些示例数据:

customerID    time
    1         2013-04-17 15:30:00 IDT
    1         2013-05-19 11:32:00 IDT
    1         2013-05-20 10:14:00 IDT
    2         2013-03-14 18:41:00 IST
    2         2013-04-24 09:52:00 IDT
    2         2013-04-24 17:08:00 IDT

我想得到以下输出:

customerID    time                        timeDelta*
    1         2013-04-17 15:30:00 IDT     NA
    1         2013-05-19 11:32:00 IDT     31.83 
    1         2013-05-20 10:14:00 IDT     0.94 
    2         2013-03-14 18:41:00 IST     NA
    2         2013-04-24 09:52:00 IDT     40.59
    2         2013-04-24 17:08:00 IDT     0.3 

 *I prefer the time will be in days
4

4 回答 4

10
z$timeDelta <- NA
z$timeDelta[-1] <- ifelse(tail(z$customerID,-1) == head(z$customerID,-1), diff(z$time)/24, NA)

或更短的版本

z$timeDelta <- NA
z$timeDelta[-1] <- ifelse(!diff(z$customerID), diff(z$time)/24, NA)
于 2013-08-18T13:06:32.543 回答
2

这有效:

## z <- read.table(text="customerID    time
##     1         2013-04-17.15:30:00.IDT
##     1         2013-05-19.11:32:00.IDT
##     1         2013-05-20.10:14:00.IDT
##     2         2013-03-14.18:41:00.IST
##     2         2013-04-24.09:52:00.IDT
##     2         2013-04-24.17:08:00.IDT", header=TRUE)
## 
## mydf$time <- z$time <- as.POSIXlt(gsub("\\.", " ", z$time))


do.call(rbind, lapply(split(z, z$customerID), function(x) {
    x$timeDelta <- c(NA, round(as.numeric(diff(x$time), units = "days"), 2))
    x
}))

##     customerID                time timeDelta
## 1.1          1 2013-04-17 15:30:00        NA
## 1.2          1 2013-05-19 11:32:00     31.83
## 1.3          1 2013-05-20 10:14:00      0.95
## 2.4          2 2013-03-14 18:41:00        NA
## 2.5          2 2013-04-24 09:52:00     40.63
## 2.6          2 2013-04-24 17:08:00      0.30
于 2013-08-18T13:01:48.630 回答
2

这应该适合你:

do.call(rbind,lapply(split(mydf,mydf$customerID), function(df)
    within(df,timeDelta<-c(NA,diff(time)/24))))

结果:

    customerID                time  timeDelta
1.1          1 2013-04-17 15:30:00         NA
1.2          1 2013-05-19 11:32:00 31.8347222
1.3          1 2013-05-20 10:14:00  0.9458333
2.4          2 2013-03-14 18:41:00         NA
2.5          2 2013-04-24 09:52:00 40.5909722
2.6          2 2013-04-24 17:08:00  0.3027778
于 2013-08-18T12:59:20.937 回答
1

在包 doBy 的 firstobs 的帮助下:

z$timeDelta <- c(NA, diff(z$time))
z$timeDelta[firstobs(z$customerID)] <- NA
于 2013-08-18T13:46:11.883 回答