使用 R 编程。我有超过 300 万的 Vendor_id、Bank_account_no 和 Date 的数据集。我想获取 Bank_account_no 更改的每个 vendor_id 的行,例如在三个月内从 X 到 X 到 X (至少三次,可能超过三次)到 Y (仅一次)到 X 。数据集的变化都是随机的,因此窗口不固定为每个 vendor_id 的行数。我使用 rle 函数来获取不同 Bank_account_no 的长度。考虑到我想为每个 vendor_id 运行此逻辑,不确定如何在 R 中为这么多行创建逻辑。可能是 data.table 可以帮助解决这个问题。输入如下:
Vendor_ID Bank_account_no Date
<!-- -->
dddd X 24-12-2018
dddd X 24-12-2018
dddd X 26-12-2018
dddd Y 27-12-2018
dddd X 28-12-2018
dddd X 29-12-2018
dddd X 29-12-2018
dddd X 31-12-2018
dddd X 24-01-2019
dddd Z 25-01-2019
dddd X 28-01-2019
dddd G 28-01-2019
dddd G 28-01-2019
eeee A 30-01-2019
eeee A 31-01-2019
eeee A 31-01-2019
eeee B 31-01-2019
eeee A 31-01-2019
输出应该是:
Vendor_ID Bank_account_no Date Case
<!-- -->
dddd X 24-12-2018 Case1
dddd X 24-12-2018 Case1
dddd X 26-12-2018 Case1
dddd Y 27-12-2018 Case1
dddd X 28-12-2018 Case1
dddd X 29-12-2018 Case2
dddd X 29-12-2018 Case2
dddd X 31-12-2018 Case2
dddd X 24-01-2019 Case2
dddd Z 25-01-2019 Case2
dddd X 28-01-2019 Case2
eeee A 30-01-2019 Case3
eeee A 31-01-2019 Case3
eeee A 31-01-2019 Case3
eeee B 31-01-2019 Case3
eeee A 31-01-2019 Case3