3

我想使用两个不同(但相似)的窗口函数来计算两个值 SUMCOUNTon is_active over user_id+item,只到行的时间 - 负 1 小时。我的直觉是使用 ROWSUNBOUNDED PRECEDING但这样我就不能随着时间的推移进行过滤

COUNT(1) OVER(PARTITION BY user_id, item ORDER BY req_time ROWS UNBOUNDED PRECEDING) 
SUM(is_active) OVER(PARTITION BY user-id, item ORDER BY req_time ROWS UNBOUNDED PRECEDING) 

但是,这没有考虑“1 小时前”间隔因素

考虑以下数据:

user_id |     req_time       | item  | is_active |  
--------+--------------------+-------------------+---
1   | 2011-01-01 12:00:00|   1   |     0     |
1   | 2011-01-01 12:30:00|   1   |     1     |
1   | 2011-01-01 15:00:00|   1   |     1     |
1   | 2011-01-01 16:00:00|   1   |     0     |
1   | 2011-01-01 16:00:00|   2   |     0     |
1   | 2011-01-01 16:20:00|   2   |     1     |
2   | 2011-02-02 11:00:00|   1   |     1     |
2   | 2011-02-02 13:00:00|   1   |     0     |
1   | 2011-02-02 16:20:00|   1   |     0     |
1   | 2011-02-02 16:30:00|   2   |     0     |

我希望得到以下结果:“值 1”是 SUM(is_active),“值 2”是 COUNT(1):

user_id |     req_time       | item  | value 1 | value 2 |  
--------+--------------------+-----------------+---------+
1   | 2011-01-01 12:00:00|   1   |    0    |    0    |
1   | 2011-01-01 12:30:00|   1   |    0    |    0    |
1   | 2011-01-01 15:00:00|   1   |    1    |    2    |
1   | 2011-01-01 16:00:00|   1   |    2    |    3    |
1   | 2011-01-01 16:00:00|   2   |    0    |    0    |
1   | 2011-01-01 16:20:00|   2   |    0    |    0    |
2   | 2011-02-02 11:00:00|   1   |    0    |    0    |
2   | 2011-02-02 13:00:00|   1   |    1    |    1    |
1   | 2011-02-02 16:20:00|   1   |    2    |    4    |
1   | 2011-02-02 16:30:00|   2   |    1    |    2    |

我正在使用基于 Postgresql 8.2.15 的 Greenplum 4.21

提前致谢!吉利比

4

2 回答 2

2

我不确定如何使用窗口函数来做到这一点,至少很容易。

select我知道的最简单的方法是在子句中使用相关子查询:

select t.*,
       (select count(*) from t t2
        where t2.user_id = t.user_id and t2.item = t.item and
              t2.req_time < t.req_time - interval '1 hour'
       ) as value1,
       (select SUM(is_active) from t t2
        where t2.user_id = t.user_id and t2.item = t.item and
              t2.req_time < t.req_time - interval '1 hour'
       ) as value2
from t

您可以在没有相关子查询的情况下执行此操作。只是有点麻烦。. .

select t.user_id, t.req_time, t.item,
       count(*) as value1,
       sum(t2.isactive) as value2
from t left outer join
     t t2
     on t.user_id = t2.user_id and
        t.item = t2.item and
        t2.req_time < t.req_time - interval '1 hour'
group by t.user_id, t.req_time, t.item 

这可能比相关子查询版本更有效(因为有两个相关性)。而且,它应该在 GreenPlum 中工作。我没有意识到它缺乏对相关子查询的支持。这是 ANSI 的一个重大突破。

于 2013-02-21T16:57:07.813 回答
1

8.3 在 SQL Fiddle。只有一个子选择。

select user_id, req_time, item, v[1] as value1, v[2] as value2
from (
    select t.*,
        (
            select array[
                coalesce(sum(is_active::integer), 0),
                count(*)
                ] as v
            from t s
            where
                user_id = t.user_id
                and item = t.item
                and req_time <= t.req_time - interval '1 hour'
        ) as v
    from t
) s
order by req_time, user_id, item
于 2013-02-21T18:11:14.533 回答