-4

我试图通过创建一个新列来检查客户是否每周进行购买,该列表示购买发生在随后的一周。

初始数据

id         timestamp           week_no   
b9968     2016-08-17 09:38:33     33
b9968     2016-08-18 17:33:23     33
b9968     2016-08-19 18:25:20     33
b9968     2016-08-23 17:46:44     34
4983f     2016-08-12 12:01:23     32
4983f     2016-08-13 17:30:47     32

最终数据

id         timestamp           week_no  diff1    
b9968     2016-08-17 09:38:33     34     1        
4983f     2016-08-13 17:30:47     32     0
4

1 回答 1

1

选项之一是dplyr用于此。

您的预期输出表有点偏离,因为第一个时间戳与 week_no 不匹配。

library(dplyr)
df %>% 
  group_by(id) %>% 
  mutate(diff1 = week_no - lag(week_no)) %>% 
  filter(timestamp == max(timestamp))

# A tibble: 2 x 4
# Groups:   id [2]
  id    timestamp           week_no diff1
  <chr> <dttm>                <int> <int>
1 b9968 2016-08-23 17:46:44      34     1
2 4983f 2016-08-13 17:30:47      32     0

数据:

df <- structure(list(id = c("b9968", "b9968", "b9968", "b9968", "4983f", 
                      "4983f"), 
               timestamp = structure(c(1471426713, 1471541603, 1471631120, 
                                       1471974404, 1471003283, 1471109447), 
                                     tzone = "UTC", class = c("POSIXct","POSIXt")), 
               week_no = c(33L, 33L, 33L, 34L, 32L, 32L)), 
          .Names = c("id", "timestamp", "week_no"), 
          row.names = c(NA, -6L), 
          class = "data.frame")
于 2018-05-12T12:30:47.147 回答