0

我正在尝试对我们基于 Web 的应用程序的使用情况进行一些分析。我有一个包含以下列电子邮件地址活动日期的表格

我想创建一个查询来回答这个问题:对于过去 180 天内的每一天,有多少人在 60 到 30 天之前进行了活动,并且在 30 到 0 天之前进行了活动。

我已经把它作为一个存储过程工作,我在过去的 180 天中逐字循环(使用每天 1 行的日期表),但这有点慢,因为我正在执行 180 个查询。

我还尝试使用带有 IN 子句的查询来完成此操作,但完成大约需要 5 分钟(该表总共只有大约 2,000 行,所以我猜它是高度未优化的)

我将如何使用一个优化的查询(甚至是存储过程)来做到这一点?

如果有帮助,这是当前存储的过程(可以工作但速度很慢):

BEGIN
    DECLARE mydate DATE;
    DECLARE period1 INT;
    DECLARE period2 INT;
    DECLARE done INT;

    DECLARE cur CURSOR FOR SELECT date_value from dim_date  order by date_value DESC;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
    SET done = 0;
    OPEN cur;

    REPEAT

    FETCH cur INTO mydate;
    IF NOT done THEN
  REPLACE INTO churn (payment_received,period2,period1,churn_name)

    select
mydate, 
count(distinct(case when (sales.payment_received BETWEEN DATE_SUB(mydate,INTERVAL p2 month) AND DATE_SUB(mydate,INTERVAL p1 month)) then email end)) AS period2,
(
select count(distinct(case when (sales.payment_received BETWEEN DATE_SUB(mydate,INTERVAL p1 month) AND mydate) then email end))
from sales where subscription = 1 AND email in (select email from sales where sales.payment_received BETWEEN DATE_SUB(mydate,INTERVAL p2 month) AND DATE_SUB(mydate,INTERVAL p1 month) ) 
) 
AS period1,
churn_name as cname
from sales 
where subscription = 1;

    END IF;    
    UNTIL done END REPEAT;
    CLOSE cur;

END;;

谢谢!

4

2 回答 2

0

我将继续假设这dim_date是一个日历表(非常方便的东西)它可能也很高兴知道你可能有什么(如果有的话)索引,但是在 2000 行时,一个像样的 RDBMS 可能会将整个表加载到内存无论如何,所以这可能不是一个因素。

不幸的是,无论你怎么看,这种类型的分析都需要时间。我相当肯定将其转换为完全基于集合的方法会加快速度,但我没有真正测试的实例。我将首先像这样重写语句:

SELECT Dim_Date.date_value, 
       COUNT(DISTINCT Period_2.email), COUNT(DISTINCT Period_1.email),
       Period_1.churn_name
FROM Dim_Date
JOIN Sales Period_2
  ON Period_2.payment_received >= DATE_SUB(Dim_Date.date_value, INTERVAL 60 DAY)
     AND Period_2.payment_received < DATE_SUB(Dim_Date.date_value, INTERVAL 30 DAY)
     AND Period_2.subscription = 1
LEFT JOIN Sales Period_1
       ON Period_1.payment_received >= DATE_SUB(Dim_Date.date_value, INTERVAL 30 DAY)
          AND Period_1.payment_received < Dim_Date.date_value
          AND Period_1.subscription = 1
          AND Period_1.email = Period_2.email
          AND Period_1.churn_name = Period_2.churn_name
WHERE Dim_Date.date_value >= DATE_SUB(CURRENT_DATE, INTERVAL 180 DAY)
      AND Dim_Date.date_value < CURRENT_DATE
GROUP BY Dim_Date.date_value, Period_1.churn_name

该语句应该运行,但未经测试。
(......我不确定我最初在这里的想法,我没有关联每个用户的两组......)

一件事-您似乎没有subscription = 1作为最内层子查询的条件;我不知道这是故意的,还是疏忽。我还假设churn_name应该是相关的,不管是什么。

于 2013-03-15T17:31:24.387 回答
0

第 1 步)获取上个月有活动的用户(DISTINCT,因为我们不在乎上个月有多少次,只是天气他们完全活跃):

SELECT DISTINCT email
FROM sales 
WHERE payment_received BETWEEN NOW() AND DATE_ADD(NOW(),INTERVAL -1 MONTHS)

步骤 2) 获取 1-2 个月前有活动的用户:

SELECT DISTINCT email
FROM sales 
WHERE payment_received BETWEEN DATE_ADD(NOW(),INTERVAL -1 MONTHS) AND DATE_ADD(NOW(),INTERVAL -2 MONTHS)

步骤 3) 将这些加入到一个结果集中

SELECT M1.email
FROM (
  SELECT DISTINCT email
  FROM sales 
  WHERE payment_received BETWEEN NOW() AND DATE_ADD(NOW(),INTERVAL -1 MONTHS)
) M1,
(
  SELECT DISTINCT email
  FROM sales 
  WHERE payment_received BETWEEN DATE_ADD(NOW(),INTERVAL -1 MONTHS) AND DATE_ADD(NOW(),INTERVAL -2 MONTHS)
) M2
WHERE M1.email = M2.email
于 2013-03-15T16:47:35.457 回答