2

我有一个看起来像这样的数据库:

userId          SessionId        Screen         Platform       Version
01              1                first          IOS            1.0.1
01              1                main           IOS            1.0.1
01              2                first          IOS            1.0.1
01              3                first          IOS            1.0.1
01              3                main           IOS            1.0.1
01              3                detail         IOS            1.0.1
02              1                first          Android        1.0.2

基本上我打算做的是确定“路径”(不同的屏幕)是否会导致更好的保留。我想将每个 sessionId 重新组织在一列中。理想的数据库应该是这样的:

userId       SessionId       Path                 Retention
01           1               first;main           3
01           2               first                3
01           3               first;main;detail    3
02           1               first                1

这是变量Retention将等于最大值SessionId

4

2 回答 2

1

基础 R 中的可能解决方案:

d2 <- aggregate(Screen ~ userId + SessionId, d, toString)
transform(d2, retention = ave(Screen, userId, FUN = length))

这使:

> d2
  userId SessionId              Screen retention
1     01         1         first, main         3
2     02         1               first         1
3     01         2               first         3
4     01         3 first, main, detail         3

另一种使用dplyr

library(dplyr)
d %>% 
  group_by(userId, SessionId) %>% 
  summarise(Screen = toString(Screen)) %>% 
  group_by(userId) %>% 
  mutate(retention = n())

这使:

  userId SessionId              Screen retention
   <chr>     <int>               <chr>     <int>
1     01         1         first, main         3
2     01         2               first         3
3     01         3 first, main, detail         3
4     02         1               first         1
于 2016-07-15T16:13:16.347 回答
0

我有一个data.table解决方案

library(data.table)
dt <- as.data.table(d)
dt[, Retention := max(SessionId), by = .(userId)]
dt[, .(Screen = paste(Screen, collapse = ";"), Retention = unique(Retention)), by = .(userId, SessionId)]

userId SessionId            Screen Retention
1:     01         1        first;main         3
2:     01         2             first         3
3:     01         3 first;main;detail         3
4:     02         1             first         1
于 2016-07-15T16:03:39.470 回答