0

按年份划分计算保留率/流失率

亲爱的社区,我正在从事一个数据挖掘项目,我想将先前的想法从 excel 转换为 R。

我有一个包含合同数据的客户数据库,并想计算保留率。我在玩这些library(lubridate)library(reshape2); library(plyr)但我无法弄清楚它在 R 中是如何工作的。

我有这样的数据:

ID    Customer        START          END
 1       Tesco   01-01-2000   31-12-2000
 2       Apple   05-11-2001   06-02-2002
 3         H&M   01-02-2002   08-05-2002
 4        Tesco  01-01-2001   31-12-2001
 5       Apple   01-01-2003   31-12-2004

我现在正在考虑将数据拆分为年份(df2000,df2001),然后在主表中存在客户名称时再次查找(如果是,则返回 1)。

结果可能如下所示:

Customer     2000    2001    2002  2003   Retention Rate
Tesco         1        1      0     0          0.5
Apple         0        1      0     1
H&M           0        0      1     0
4

1 回答 1

0

使用dplyr,您可以尝试year从每个START日期、count每个条目的数量Customer和中获取值year,计算保留率和spread宽格式数据。

library(dplyr)
df %>%
  mutate(year = format(as.Date(START, format = "%d-%m-%Y"), "%Y")) %>%
  dplyr::count(Customer, year) %>%
  group_by(Customer) %>%
  mutate(ret = n()/n_distinct(.$year))  %>%
  tidyr::spread(year, n, fill = 0) 

#  Customer   ret  `2000` `2001` `2002` `2003`
#  <fct>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#1 Apple     0.5       0      1      0      1
#2 H&M       0.25      0      0      1      0
#3 Tesco     0.5       1      1      0      0

编辑

要考虑财政年度的数据,而不是从 10 月到 9 月,我们可以这样做

library(lubridate)

df %>%
  mutate(START = dmy(START), 
         START = if_else(month(START) >= 10, START + years(1), START),
         year = year(START)) %>%
  dplyr::count(Customer, year) %>%
  group_by(Customer) %>%
  mutate(ret = n()/n_distinct(.$year))  %>%
  tidyr::spread(year, n, fill = 0) 

数据

df <- structure(list(ID = 1:5, Customer = structure(c(3L, 1L, 2L, 3L, 
1L), .Label = c("Apple", "H&M", "Tesco"), class = "factor"), 
START = structure(c(1L, 5L, 4L, 2L, 3L), .Label = c("01-01-2000", 
"01-01-2001", "01-01-2003", "01-02-2002", "05-11-2001"), class = "factor"), 
END = structure(c(3L, 1L, 2L, 4L, 5L), .Label = c("06-02-2002", 
"08-05-2002", "31-12-2000", "31-12-2001", "31-12-2004"), class = "factor")), 
class = "data.frame", row.names = c(NA, -5L))
于 2019-09-06T07:27:25.350 回答