1

我正在尝试按周期存储排名值。从 rank 1 到 rank 2 是 cycle1,类似地从 rank 2 到 rank 3 是 cycle2,依此类推并为每个循环创建二进制值(如下所示)

之前的数据框

id               event              date                   rank       
1241a21ef        one             2016-08-13 20:03:37         1
1241a21ef        two             2016-08-15 05:41:09         2
12426203b        two             2016-08-04 05:35:10         1
12426203b       three            2016-08-06 02:07:41         2
12426203b        two             2016-08-10 05:42:33         3
12426203b       three            2016-08-14 02:43:16         4

之后的数据框

id           cycle1     cycle2   cycle3
1241a21ef      1          0         0
12426203b      1          1         1

注意:每个组(即 id)基于时间戳具有唯一的排名值,并且下一个新 id 的排名将重置为 1

4

1 回答 1

1

您可以使用dplyr::countwithtidyr::spread来获取以所需格式列出的数据,如下所示:

library(dplyr)
library(tidyr)

df %>% group_by(id) %>%
  arrange(id, rank) %>%   
  filter(rank != last(rank)) %>%   #drop last rank for each id
  mutate(cycle = paste0("cycle", rank)) %>%  #desired column names after spread
  group_by(id, cycle) %>%
  count() %>%
  spread(key = cycle, value = n, fill = 0) %>%
  as.data.frame() 





#          id cycle1 cycle2 cycle3
# 1 1241a21ef      1      0      0
# 2 12426203b      1      1      1

数据:

df <- read.table(text =
"id               event              date                   rank       
1241a21ef        one             '2016-08-13 20:03:37'         1
1241a21ef        two             '2016-08-15 05:41:09'         2
12426203b        two             '2016-08-04 05:35:10'         1
12426203b       three            '2016-08-06 02:07:41'         2
12426203b        two             '2016-08-10 05:42:33'         3
12426203b       three            '2016-08-14 02:43:16'         4",
header = TRUE, stringsAsFactors = FALSE)
于 2018-05-12T05:36:09.280 回答