0

我有按日期登录的用户。我的要求是跟踪过去 90 天窗口内登录的用户数。

我对一般的 SQL 和 Teradata 都是新手,我无法让窗口功能按我的需要工作。

我需要以下结果,其中 ACTIVE 是 DATE 前 90 天窗口中出现的唯一 USER_ID 的计数。

DATES        ACTIVE_IN_WINDOW
12/06/2018     20
13/06/2018     45                 
14/06/2018     65 
15/06/2018     73 
17/06/2018     24      
18/06/2018     87  
19/06/2018     34
20/06/2018     51

目前我的脚本如下。

正是这条线我无法正确

COUNT ( USER_ID) OVER (PARTITION BY USER_ID ORDER BY EVT_DT ROWS BETWEEN 90 PRECEDING AND  0 FOLLOWING)

我怀疑我需要一组不同的功能来完成这项工作。

SELECT    b.DATES , a.ACTIVE_IN_WINDOW

FROM    

(
        SELECT 

        CAST(CALENDAR_DATE AS DATE) AS DATES FROM SYS_CALENDAR.CALENDAR

        WHERE DATES BETWEEN ADD_MONTHS(CURRENT_DATE, - 10)  AND CURRENT_DATE
) b

LEFT JOIN

(
        SELECT    USER_ID   , EVT_DT 

        , COUNT ( USER_ID) OVER (PARTITION BY USER_ID ORDER BY EVT_DT ROWS BETWEEN 90 PRECEDING AND  0 FOLLOWING) AS ACTIVE_IN_WINDOW

        FROM ENV0.R_ONBOARDING
) a

ON a.EVT_DT = b.DATES

ORDER BY b.DATES

感谢您提供任何帮助。

4

2 回答 2

1

逻辑类似于 Gordon',但在 Teradata 上,非等值连接而不是相关标量子查询通常更有效:

SELECT b.DATES , Count(DISTINCT USER_ID)
FROM
 (
   SELECT CALENDAR_DATE AS DATES 
   FROM SYS_CALENDAR.CALENDAR
   WHERE DATES BETWEEN Add_Months(Current_Date, - 10)  AND Current_Date
 ) b
LEFT JOIN
 ( -- apply DISTINCT before aggregation to reduce intermediate spool
   SELECT DISTINCT USER_ID, EVT_DT
   FROM ENV0.R_ONBOARDING
 ) AS a
ON a.EVT_DT BETWEEN Add_Months(b.DATES,-3) AND b.DATES
GROUP BY 1
ORDER BY 1

当然,这将需要一个大的假脱机和大量的 CPU。

编辑:

切换到周可以减少开销,我使用日期而不是周数(对于其他范围更容易修改):

SELECT b.Week , Count(DISTINCT USER_ID) 
FROM
 ( -- Return only Mondays instead of DISTINCT over all days 
   SELECT calendar_date AS Week
   FROM SYS_CALENDAR.CALENDAR 
   WHERE CALENDAR_DATE BETWEEN Add_Months(Current_Date, -9) AND Current_Date
     AND day_of_week = 2 -- 2 = Monday
 ) b 
LEFT JOIN 
 (
   SELECT DISTINCT USER_ID,
     -- td_monday returns the previous Monday, but we need the following monday
     -- covers the previous Tuesday up to the current Monday
            Td_Monday(EVT_DT+6) AS PERIOD_WEEK
   FROM ENV0.R_ONBOARDING
   -- You should add another condition to limit the actually covered date range, e.g.
   -- where EVT_DT BETWEEN Add_Months(b.DATES,-13) AND b.DATES
 ) AS a 
ON a.PERIOD_WEEK BETWEEN b.Week-(12*7) AND b.Week 
GROUP BY 1 
ORDER BY 1 

解释应该复制日历作为产品加入的准备,否则您可能需要在可变表中实现日期。最好不要使用sys_calendar,没有统计信息,例如优化器不知道每周/每月/每年有多少天等。检查您的系统,应该有一个为您公司需求设计的日历表(所有的统计信息列)

于 2018-12-05T17:57:55.170 回答
0

如果您的数据不是太大,子查询可能是最简单的方法:

SELECT c.dte,
       (SELECT COUNT(DISTINCT o.USER_ID)
        FROM ENV0.R_ONBOARDING o
        WHERE o.EVT_DT > ADD_MONTHS(dte, -3) AND
              o.EVT_DT <= dte
       ) as three_month_count
FROM (SELECT CAST(CALENDAR_DATE AS DATE) AS dte
      FROM SYS_CALENDAR.CALENDAR
      WHERE CALENDAR_DATE BETWEEN ADD_MONTHS(CURRENT_DATE, - 10)  AND CURRENT_DATE
     ) c;

您可能希望从更短的时间范围开始,然后 3 个月,以查看查询的执行情况。

于 2018-12-05T12:16:09.983 回答