我有一些点击流数据,我想以特定方式进行归因分析,但我需要为转换和不转换的用户输入特定格式。
代表数据:
df <- structure(list(User_ID = c(2001, 2001, 2001, 2002, 2001, 2002,
2001, 2002, 2002, 2003, 2003, 2001, 2002, 2002, 2001), Session_ID = c("1001",
"1002", "1003", "1004", "1005", "1006", "1007", "Not Set", "Not Set",
"Not Set", "Not Set", "Not Set", "1008", "1009", "Not Set"),
Date_time = structure(c(1540103940, 1540104060, 1540104240,
1540318080, 1540318680, 1540318859, 1540314360, 1540413060,
1540413240, 1540538460, 1540538640, 1540629660, 1540755060,
1540755240, 1540803000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Source = c("Facebook", "Facebook", "Facebook", "Google",
"Email", "Google", "Email", "Referral", "Referral", "Facebook",
"Facebook", "Google", "Referral", "Direct", "Direct"), Conversion = c(0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -15L), spec = structure(list(
cols = list(User_ID = structure(list(), class = c("collector_double",
"collector")), Session_ID = structure(list(), class = c("collector_character",
"collector")), Date_time = structure(list(format = ""), class = c("collector_datetime",
"collector")), Source = structure(list(), class = c("collector_character",
"collector")), Conversion = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
然后设置类:
df <- df %>%
mutate(User_ID = as.factor(User_ID),
Session_ID = as.factor(Session_ID),
Date_time = as.POSIXct(Date_time)
)
我想获得购买的所有用户访问路径,或不导致购买的总路径。
新列的格式path
例如:Facebook > Facebook > Facebook > Email > Email
对于我知道如何使用的用户 2001
mutate(path = paste0(source, collapse = " > "))
并发症是:
- 大多数会话 ID 未设置,这意味着它们丢失了
- 一些用户可能会多次转换
- 一些用户可能会转换并返回但不会转换
每行将是:
- 按用户 ID 进行的转换 - 大多数转换的用户只转换一次,但有些可能会转换多次,在这种情况下,每次转换都会有一行。该
path
列将反映转化过程 - 对于用户的第二次或后续转化,只会显示上一次转化之后的路径。 - 或未转换的用户旅程,其总路径采用上述格式
对于上述 reprex,结果如下所示:
# A tibble: 5 x 5
User_ID Session_ID Date_time Conversion Path
<dbl> <chr> <dttm> <dbl> <chr>
1 2001 1007 2018-10-23 17:06:00 1 Facebook > Facebook > Facebook > Email > Email
2 2002 Not Set 2018-10-24 20:34:00 1 Google > Google > Referral > Referral
3 2003 Not Set 2018-10-26 07:24:00 0 Facebook > Facebook
4 2002 1009 2018-10-28 19:34:00 0 Referral > Direct
5 2001 Not Set 2018-10-29 08:50:00 1 Google > Direct
... 在哪里:
- 用户 2001 转换了两次,路径分别表示;
- 用户 2002 已转换然后稍后返回但未转换,因此已转换和未转换的路径表示为单独的行。
- 用户 2003 从未转换,因此表示此路径。