我在下面有一个data.frame,我想“分块”时间段,以便对于每个company_id,它将时间段“折叠”到相隔三十天的时间段。
company_id startDate endDate
1 209952 2012-09-17 2012-10-17
2 209952 2012-10-17 2012-11-17
3 209952 2012-11-17 2012-12-17
4 209952 2012-12-17 2013-01-17
5 209952 2013-01-17 2013-02-17
6 209952 2013-02-17 2013-03-17
7 209952 2013-03-17 2013-04-17
8 209952 2013-04-17 2013-05-17
9 209952 2013-05-17 2013-06-17
10 209952 2013-06-17 2013-07-17
11 209952 2013-07-17 2013-08-17
12 209952 2013-08-17 2013-09-17
13 209952 2013-09-17 2013-10-17
14 209952 2013-10-17 2013-11-17
15 209952 2013-11-17 2013-12-17
16 209952 2013-12-17 2014-01-17
17 209952 2014-01-17 2014-02-17
18 209952 2014-02-12 2014-03-12
19 209952 2014-03-12 2014-04-12
20 209952 2014-04-12 2014-05-12
21 209952 2014-05-12 2014-06-12
22 209952 2014-06-12 2014-07-12
23 209952 2014-07-12 2014-08-12
24 209952 2014-08-12 2014-09-12
25 209952 2014-09-12 2014-10-12
26 209952 2014-10-12 2014-11-12
27 209952 2014-11-12 2014-12-12
28 209952 2014-12-12 2015-01-12
29 209952 2015-01-12 2015-02-12
30 209952 2015-02-12 2015-03-12
31 209952 2015-03-12 2015-04-12
32 209952 2015-04-13 2015-05-13
33 209952 2015-05-07 2016-05-07
34 209952 2015-05-07 2015-06-07
35 209952 2015-06-07 2015-07-07
36 209952 2015-07-07 2015-08-07
37 209952 2015-08-07 2015-09-07
38 209952 2016-05-07 2017-10-23
39 2802315 2012-10-19 2012-11-19
40 2802315 2012-11-19 2012-12-19
41 2802315 2012-12-19 2013-01-19
42 2802315 2013-01-19 2013-02-19
43 2802315 2013-02-19 2013-03-19
44 2802315 2013-03-19 2013-04-19
45 2802315 2013-04-19 2013-05-19
46 2802315 2013-05-19 2013-06-19
47 2802315 2013-06-19 2013-07-19
48 2802315 2013-07-19 2013-08-19
49 2802315 2013-08-19 2013-09-19
50 2802315 2013-09-19 2013-10-19
51 2802315 2013-10-19 2013-11-19
52 2802315 2013-11-18 2013-12-18
53 2802315 2013-12-18 2014-01-18
54 2802315 2014-01-18 2014-02-18
55 2802315 2014-02-18 2014-03-18
56 2802315 2014-03-18 2014-04-18
57 2802315 2014-04-18 2014-05-18
58 2802315 2014-09-29 2014-10-29
59 2802315 2014-10-29 2014-11-29
60 2802315 2015-04-22 2015-05-22
61 2802315 2015-05-21 2015-06-21
62 2802315 2015-06-23 2015-09-23
63 2802315 2015-07-23 2015-08-23
64 2802315 2015-11-23 2016-05-23
我尝试了以下方法:
test <- blocks %>%
filter(company_id %in% c(209952, 2802315)) %>%
arrange(company_id, startDate) %>%
group_by(company_id) %>%
mutate(
week = cumsum(startDate - lag(endDate, default = 0) > 30)
) %>%
group_by(company_id, week) %>%
summarize(
startDate = min(startDate),
endDate = max(endDate)
)
问题是第 (1) 行和第 (2) 行的时间间隔应该合并为一个,因此 startDate = 2012-09-17 和 endDate = 2017-10-23 因为两者之间的时间不到三十天。
company_id week startDate endDate
<dbl> <int> <date> <date>
1 209952 1 2012-09-17 2016-05-07
2 209952 2 2016-05-07 2017-10-23
3 2802315 1 2012-10-19 2014-05-18
4 2802315 2 2014-09-29 2014-11-29
5 2802315 3 2015-04-22 2015-09-23
6 2802315 4 2015-11-23 2016-05-23
我正在寻找的输出是
1 209952 1 2012-09-17 2016-10-23
2 2802315 1 2012-10-19 2014-05-18
3 2802315 2 2014-09-29 2014-11-29
4 2802315 3 2015-04-22 2015-09-23
5 2802315 4 2015-11-23 2016-05-23