r - R finding date intervals by ID

Question

All this data is in LONG Format, in that you will get multi line items for the 1 Customer ID.

I can get the first date last date using R DateDiff but converting the file to WIDE format using Plyr, still end up with the same problem of getting multiple orders by customer, just less rows and more columns.

Is there an R function that extends R DateDiff to work out how to get the time interval between purchases by Customer ID? That is, time between order 1 and 2, order 2 and 3, and so on assuming these orders exists.

CID     Order.Date  Order.DateMY    Order.No_    Amount Quantity  Category.Name    Locality
1       26/02/13    Feb-13          zzzzz                   1       r                 MOSMAN
1       26/05/13    May-13          qqqqq                   1       x               CHULLORA
1       28/05/13    May-13           wwwww                  1       r               MOSMAN
1       28/05/13    May-13           wwwww                  1       x                 MOSMAN
2       19/08/13    Aug-13          wwwwww                  1       o                OAKLEIGH SOUTH
3       3/01/13    Jan-13           wwwwww                  1       x                 CURRENCY CREEK
4       28/08/13    Aug-13         eeeeeee                  1       t                 BRISBANE
4       10/09/13    Sep-13         rrrrrrrrr                1       y               BRISBANE
4       25/09/13    Sep-13         tttttttt                 2       e               BRISBANE

score 2 · Accepted Answer

由于您没有给出预期的结果，因此不清楚您想做什么。但我猜你想要两个订单之间的间隔。

library(data.table)
DT <- as.data.table(DF)
DT[, list(Order.Date,
          diff = c(0,diff(sort(as.Date(Order.Date,'%d/%m/%y')))) ),CID]

   CID Order.Date diff
1:   1   26/02/13    0
2:   1   26/05/13   89
3:   1   28/05/13    2
4:   1   28/05/13    0
5:   2   19/08/13    0
6:   3    3/01/13    0
7:   4   28/08/13    0
8:   4   10/09/13   13
9:   4   25/09/13   15

score 1 · Accepted Answer

拆分数据框并找到每个客户 ID 的间隔。

df <- data.frame(customerID=as.factor(c(rep("A",3),rep("B",4))),
OrderDate=as.Date(c("2013-07-01","2013-07-02","2013-07-03","2013-06-01","2013-06-02",
"2013-06-03","2013-07-01")))

dfs <- split(df,df$customerID)
lapply(dfs,function(x){
tmp <-diff(x$OrderDate)
tmp
})

或使用plyr

library(plyr)
dfs <- dlply(df,.(customerID),function(x)return(diff(x$OrderDate)))

score 0 · Accepted Answer

我知道这个问题很老了，但我只是想出了另一种方法，并想记录下来：

> library(dplyr)
> library(lubridate)
> df %>% group_by(customerID) %>% 
    mutate(SinceLast=(interval(ymd(lag(OrderDate)),ymd(OrderDate)))/86400)

# A tibble: 7 x 3
# Groups:   customerID [2]
  customerID OrderDate  SinceLast
  <fct>      <date>         <dbl>
1 A          2013-07-01       NA 
2 A          2013-07-02        1.
3 A          2013-07-03        1.
4 B          2013-06-01       NA 
5 B          2013-06-02        1.
6 B          2013-06-03        1.
7 B          2013-07-01       28.

r - R finding date intervals by ID

3 回答 3

Related

Reference