我的数据集:
+--------------------+----------+----------+-------------------+--------------------+--------+-------+---------+--------------------+-------------------+-------------------+-----------+----+---------------+-----------------+-----------+------------+----------------------+------------------+---------+
| event_time|event_type|product_id| category_id| category_code| brand| price| user_id| user_session| Event_time_NoUTC| Event_timestamp|day_of_week|hour|primaryCategory|secondaryCategory|eventVisits|productCount|secondaryCategoryCount| AvgCatExpense|SessCount|
+--------------------+----------+----------+-------------------+--------------------+--------+-------+---------+--------------------+-------------------+-------------------+-----------+----+---------------+-----------------+-----------+------------+----------------------+------------------+---------+
|2019-10-06 07:04:...| view| 1004565|2053013555631882655|electronics.smart...| huawei| 169.84|231943435|428ebb99-3568-4e1...|2019-10-06 07:04:50|2019-10-06 07:04:50| 1| 7| electronics| smartphone| 1| 1| 1| 380.2349402627628| 1|
|2019-10-25 03:50:...| view| 5100337|2053013553341792533| electronics.clocks| apple| 319.34|266287781|f55edf02-3fd4-48f...|2019-10-25 03:50:28|2019-10-25 03:50:28| 6| 3| electronics| clocks| 7| 7| 7| 369.7054359810376| 4|
|2019-10-25 03:52:...| view| 1005105|2053013555631882655|electronics.smart...| apple|1397.09|266287781|118dbcd6-fe31-4cc...|2019-10-25 03:52:09|2019-10-25 03:52:09| 6| 3| electronics| smartphone| 7| 7| 7| 369.7054359810376| 4|
|2019-10-26 12:15:...| view| 6000157|2053013560807654091|auto.accessories....|starline| 91.12|266287781|992d03b4-c561-4fb...|2019-10-26 12:15:56|2019-10-26 12:15:56| 7| 12| auto| accessories| 7| 7| 7| 369.7054359810376| 4|
事件类型分为三类:查看、购物车和购买。我想用一个新列 is_purchased=1 对 user_id 和 product_id 进行分类,如果它的事件类型为购买,其他为 0。之后,我将删除冗余行,如下所示,这基本上可以帮助我分类我的数据是否客户是否会流失。
我正在考虑使用 user_id 和 product_id 对数据进行分区,然后对已购买的数据进行分类。请建议您解决此问题的方法?