1

非常简化,一个包含一些示例数据的表格:

action_date account_id
1/1/2010    123
1/1/2010    123
1/1/2010    456
1/2/2010    123
1/3/2010    789

对于上面的数据,我需要一个查询,它将给出以下内容:

action_date num_events  num_unique_accounts  num_unique_accounts_wtd
1/1/2010    3           2                    2
1/2/2010    1           1                    2
1/3/2010    1           1                    3

正如您在此处看到的, num_unique_accounts_wtd 给出了唯一时期的一种滚动结束日期......

起初,人们会认为是形式的查询

WITH
    events AS
    (
        SELECT
            action_date
            , COUNT(account_id) num_events
            , COUNT(DISTINCT account_id) num_unique_accounts
        FROM     actions
        GROUP BY action_date
    )
SELECT
    action_date
    , num_events
    , num_unique_accounts
    , SUM(num_unique_accounts) OVER (PARTITION BY NEXT_DAY(action_date, 'Monday') - 7 ORDER BY action_date ASC) num_unique_accounts_wtd
FROM events

会起作用,但如果你仔细观察,它只会每天添加 num_unique_accounts。如果要在 2010 年 1 月 2 日运行查询,为了清楚起见,它会给出 num_unique_accounts_wtd = 3,因为 2 + 1。

有任何想法吗?

编辑:为清楚起见,又添加了一行数据和输出

4

2 回答 2

0

我会将事件查询拆分为 2:

WITH
    events1 AS
    (
        SELECT 
               NEXT_DAY(action_date, 1) - 7 week
             , action_date             
             , COUNT(account_id) num_events
             , COUNT(DISTINCT account_id) num_unique_accounts
        FROM     actions
        GROUP BY action_date
    ),
    events2 AS
    (
        SELECT NEXT_DAY(action_date, 1) - 7 week               
             , COUNT(DISTINCT account_id) num_unique_accounts_wtd
        FROM     actions
        GROUP BY NEXT_DAY(action_date, 1)
    )
SELECT events1.*, events2.num_unique_accounts_wtd
  FROM events1, events2 
 WHERE events1.week = events2.week

其中events1将选择一天不同帐户的数量,而events2将选择每周不同帐户的数量。

编辑:我现在明白了这个要求。但是,如果操作表中的行数非常高,我唯一的想法会很重:

WITH
events AS
(
    SELECT 
           NEXT_DAY(action_date, 1) - 7 week
         , action_date             
         , COUNT(account_id) num_events
         , COUNT(DISTINCT account_id) num_unique_accounts
    FROM     actions
    GROUP BY action_date 
)      
SELECT events.*, 
      (SELECT COUNT(DISTINCT(account_id)) 
         FROM actions 
        WHERE action_date < events.week + 7) as num_unique_accounts_wtd
 FROM events
ORDER BY events.action_date

如您所见,这个想法是(重新)计算事件子查询的每一行的所有不同account_id

于 2012-09-12T08:50:13.070 回答
0

似乎答案可能是能够修改分析函数以包含某种形式

COUNT(DISTINCT ...) OVER (PARTITION BY ... ORDER BY ... RANGE BETWEEN ... AND ...) 

因为 RANGE BETWEEN 允许表达式,所以 PARTITION BY 窗口可以进一步子集化以获得我们正在寻找的东西——不幸的是,Oracle 给出了一个

ORA-30487 DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY

错误,所以我们不能使用它。

在谷歌搜索错误之后,我发现其他人也在尝试相同的事情(这里这里),并且在链接中找到了两个答案——其中一个用于我的真实数据。

作为参考,使用原始帖子中的模型对这个问题的答案将是以下形式:

SELECT    action_date, COUNT(account_id) num_attempts, MAX(num_accounts) num_unique_accounts_wtd
FROM
(
    SELECT
        action_date
        , account_id
        , SUM(is_unique) OVER (PARTITION BY NEXT_DAY(action_date, 'Monday') - 7 ORDER BY action_date ASC, account_id ASC) num_accounts
    FROM
    (
        SELECT
            action_date
            , account_id
            , CASE
                WHEN LAG(account_id) OVER (PARTITION BY NEXT_DATE(action_date, 'Monday') - 7, account_id ORDER BY action_date ASC) = account_id 
                THEN 0
                ELSE 1
            END is_unique
            FROM
                actions
    )
)
GROUP BY  action_date

所以数据是

  1. 迭代并确定对于每个帐号的一周,它是否是唯一的
  2. 然后对于每周,首先按操作日期订购集合,然后是 account_id 并创建一个运行总计
  3. 按操作日期分组并取最大周数
于 2012-09-12T16:41:54.927 回答