sql - 计算保留（或查找一组记录中的一条记录是否存在于另一组中）

Question

SQL Server我正在使用SQL从R. 我可以使用任何一个来解决我的问题。

这是我的数据：

structure(list(id = c(1, 2, 3, 1, 2), FY = c(2010, 2008, 2009, 2011, 2009), sales = c(100, 200, 300, 400, 500)), .Names = c("id", "FY", "sales"), row.names = c(NA, -5L), class = "data.frame")

我称之为测试

 id FY   sales
 1 2010   100
 2 2008   200
 3 2009   300
 1 2011   400
 2 2009   500

编辑：我想找到的是客户保留率，即谁在 2008 年和 2009 年购买；谁在 2009 年和 2010 年购买过；谁在 2010 年和 2011 年购买。

最终结果网格将把 1 或非空值放在客户被保留到明年的年份。

我试图得到的最终结果将如下所示：

id 2008 2009 2010 2011
1               1     
2     1

使用这种类型的表格，我可以计算每年的保留百分比。

现在，我可以编写各种CASE语句和子查询来创建这样的网格，但我的实际数据已经超过 10 年了，我讨厌这些年来硬编码。R也许，一旦数据是，这样做会更容易cast，但我很难编码。

score 3 · Accepted Answer

tbl <- xtabs( ~ id+FY, data=test)  #......
tbl

这就是正销售额，你想要连续年份为 1 的销售额：

 0+( tbl[ , -1]==1 & tbl[,-ncol(tbl)]==1)
#-------
   FY
id  2009 2010 2011
  1    0    0    1
  2    1    0    0
  3    0    0    0

逻辑运算将产生一个 TRUE 和 FALSE 矩阵，并将 0 添加到逻辑将其转换为 0/1。我注意到这个结果的差异，并认为它更容易接受。您的标签可能表明我们可以预见未来。如果您不同意，您可以使用 revese，因为列标签取自第一个参数：

0+( tbl[,-ncol(tbl)]==1 &tbl[ , -1]==1)

score 0 · Accepted Answer

我自己的丑陋黑客：

    require('sqldf')
    require('plyr')

    test_returning <- ddply(test, .(FY), function(df) { 
      cur_fy <- unique(df$FY)
      q1 <- 'select count(1) as returning from df'
      q2 <- paste('where exists (select 0 from test t where t.id = df.id and t.FY = ', cur_fy + 1, ")", sep = " ")
      qry <- paste(q1, q2, sep = " ")
      sqldf(qry)
    })

    test_total <- ddply(test, .(FY), summarize, total = length(id))
    test_retention <- merge(test_returning, test_total, all.y = TRUE)
    test_retention$retpct <- with(test_retention, returning/total)

score 0 · Accepted Answer

以下是执行数据透视的一种方法（使用聚合）：

select id,
       p2008 * p2009 as [2008],
       p2009 * p2010 as [2009],
       p2010 * p2011 as [2010],
       p2011 * p2012 as [2011]
from (select t.*,
             (case when FY = 2008 then 1 end) as p2008,
             (case when FY = 2009 then 1 end) as p2009,
             (case when FY = 2010 then 1 end) as p2010,
             (case when FY = 2011 then 1 end) as p2011,
             (case when FY = 2012 then 1 end) as p2012
      from test t
     ) t
group by id

如果你有兴趣做这样的留存分析，你应该学习生存分析，尤其是重复事件分析。

sql - 计算保留（或查找一组记录中的一条记录是否存在于另一组中）

3 回答 3

Related

Reference