2

Having the following table which comprises some key columns which are: customer ID | order ID | product ID | Quantity | Amount | Order Date.

All this data is in LONG Format, in that you will get multi line items for the 1 Customer ID.

I can get the first date last date using R DateDiff but converting the file to WIDE format using Plyr, still end up with the same problem of getting multiple orders by customer, just less rows and more columns.

Is there an R function that extends R DateDiff to work out how to get the time interval between purchases by Customer ID? That is, time between order 1 and 2, order 2 and 3, and so on assuming these orders exists.

CID     Order.Date  Order.DateMY    Order.No_    Amount Quantity  Category.Name    Locality
1       26/02/13    Feb-13          zzzzz                   1       r                 MOSMAN
1       26/05/13    May-13          qqqqq                   1       x               CHULLORA
1       28/05/13    May-13           wwwww                  1       r               MOSMAN
1       28/05/13    May-13           wwwww                  1       x                 MOSMAN
2       19/08/13    Aug-13          wwwwww                  1       o                OAKLEIGH SOUTH
3       3/01/13    Jan-13           wwwwww                  1       x                 CURRENCY CREEK
4       28/08/13    Aug-13         eeeeeee                  1       t                 BRISBANE
4       10/09/13    Sep-13         rrrrrrrrr                1       y               BRISBANE
4       25/09/13    Sep-13         tttttttt                 2       e               BRISBANE
4

3 回答 3

2

由于您没有给出预期的结果,因此不清楚您想做什么。但我猜你想要两个订单之间的间隔。

library(data.table)
DT <- as.data.table(DF)
DT[, list(Order.Date,
          diff = c(0,diff(sort(as.Date(Order.Date,'%d/%m/%y')))) ),CID]

   CID Order.Date diff
1:   1   26/02/13    0
2:   1   26/05/13   89
3:   1   28/05/13    2
4:   1   28/05/13    0
5:   2   19/08/13    0
6:   3    3/01/13    0
7:   4   28/08/13    0
8:   4   10/09/13   13
9:   4   25/09/13   15
于 2013-07-01T09:33:06.530 回答
1

拆分数据框并找到每个客户 ID 的间隔。

df <- data.frame(customerID=as.factor(c(rep("A",3),rep("B",4))),
OrderDate=as.Date(c("2013-07-01","2013-07-02","2013-07-03","2013-06-01","2013-06-02",
"2013-06-03","2013-07-01")))

dfs <- split(df,df$customerID)
lapply(dfs,function(x){
tmp <-diff(x$OrderDate)
tmp
})

或使用plyr

library(plyr)
dfs <- dlply(df,.(customerID),function(x)return(diff(x$OrderDate)))
于 2013-07-01T08:31:29.183 回答
0

我知道这个问题很老了,但我只是想出了另一种方法,并想记录下来:

> library(dplyr)
> library(lubridate)
> df %>% group_by(customerID) %>% 
    mutate(SinceLast=(interval(ymd(lag(OrderDate)),ymd(OrderDate)))/86400)

# A tibble: 7 x 3
# Groups:   customerID [2]
  customerID OrderDate  SinceLast
  <fct>      <date>         <dbl>
1 A          2013-07-01       NA 
2 A          2013-07-02        1.
3 A          2013-07-03        1.
4 B          2013-06-01       NA 
5 B          2013-06-02        1.
6 B          2013-06-03        1.
7 B          2013-07-01       28.
于 2018-04-05T04:04:51.813 回答