0

我找到了找到下一个事件日期的解决方案,但没有找到包含事件所有数据的解决方案。通过作弊,我可以完成它,但这仅适用于 mysql 并且在 vertica 中失败。

这是我要解决的问题:

我想显示所有事件 a 以及来自 a 之后且不是类型 a 的第一个事件 X 的数据。因此,这里是剪切和粘贴示例,因此您可以使用它来查看实际效果:

CREATE TABLE events (user_id int ,created_at int, event varchar(20));
INSERT INTO events values (0,0, 'a');
INSERT INTO events values (0,1, 'b');
INSERT INTO events values (0,2, 'c');
INSERT INTO events values (0,3, 'a');
INSERT INTO events values (0,4, 'c');
INSERT INTO events values (0,5, 'b');
INSERT INTO events values (0,6, 'a');
INSERT INTO events values (0,7, 'a');
INSERT INTO events values (0,8, 'd');

SELECT * FROM events;
+---------+------------+-------+
| user_id | created_at | event |
+---------+------------+-------+
|       0 |          0 | a     |
|       0 |          1 | b     |
|       0 |          2 | c     |
|       0 |          3 | a     |
|       0 |          4 | c     |
|       0 |          5 | b     |
|       0 |          6 | a     |
|       0 |          7 | a     |
|       0 |          8 | d     |
+---------+------------+-------+
9 rows in set (0.00 sec)

这是我知道如何同时获得的结果,但我似乎也无法获得其中的事件信息:

SELECT user_id, MAX(purchased) AS purchased, spent 
FROM ( 
    SELECT
        e1.user_id AS user_id, e1.created_at AS purchased, 
        MIN(e2.created_at) AS spent
    FROM events e1, events e2
    WHERE
        e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
        e1.Event = 'a' AND e2.Event != 'a'
    GROUP BY e1.user_id, e1.created_at
) e3 GROUP BY user_id, spent;

 user_id | purchased | spent 
---------+-----------+-------
       0 |         0 |     1
       0 |         3 |     4
       0 |         7 |     8

现在,如果我也想要其中的事件类型,它不适用于上面的查询,因为您要么必须使用 group-by 中的 event 字段(不是我们想要的),要么使用聚合(不是我们想要的任何一个)。有趣的是,它在 mysql 中有效,但我认为这是作弊,因为我必须为此使用 vertica,所以它对我没有帮助:

SELECT user_id, MAX(purchased) as purchased, spent, event FROM (
    SELECT 
        e1.User_ID AS user_id, 
        e1.created_at AS purchased, 
        MIN(e2.created_at) AS spent, 
        e2.event AS event 
    FROM events e1, events e2 
    WHERE 
        e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND 
        e1.Event = 'a' AND e2.Event != 'a' 
    GROUP BY
        e1.user_id,e1.created_at
 ) e3 GROUP BY user_id, spent;


+---------+-----------+-------+-------+
| user_id | purchased | spent | event |
+---------+-----------+-------+-------+
|       0 |         0 |     1 | b     |
|       0 |         3 |     4 | c     |
|       0 |         7 |     8 | d     |
+---------+-----------+-------+-------+
3 rows in set (0.00 sec)

对于 vertica,相同的查询会引发错误:错误 2640:列“e2.event”必须出现在 GROUP BY 子句中或用于聚合函数中

什么是让这两个事件与其所有列配对并且不作弊的优雅解决方案,以便在 vertica 或其他不允许作弊的数据库中执行时获得与上面所示相同的结果?在示例数据中,我只需要一个额外的列,即事件类型,但在现实世界中,它将是两列或三列。

请在回答之前使用发布的示例数据进行尝试:)

4

3 回答 3

0

好的,虽然我不是 100% 确定我理解你想要做什么,但看看这是否行不通:

SELECT e3.user_id, MAX(e3.purchased) AS purchased, e3.spent, e.event
FROM ( 
    SELECT
        e1.user_id AS user_id, e1.created_at AS purchased, 
        MIN(e2.created_at) AS spent
    FROM events e1, events e2
    WHERE
        e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
        e1.Event = 'a' AND e2.Event != 'a'
    GROUP BY e1.user_id, e1.created_at
) e3 
 JOIN events e on e3.user_id = e.user_id and e3.spent = e.created_at
GROUP BY e3.user_id, e3.spent, e.event

本质上,我只是再次加入事件表,假设user_id并且created_at是您的主键。

这是SQL Fiddle

于 2013-01-10T20:38:13.003 回答
0

试试这个...

With    Cte As
(
        Select  Row_Number() Over (Partition By [user_id] Order By [created_at]) As row_num,
                [user_id],
                [created_at],
                [event]
        From    [events]
)
Select  c1.[user_id],
        c1.[created_at] As purchased,
        c2.[created_at] As spent,
        c2.[event]
From    Cte c1
Left    Join Cte c2
        On  c1.row_num = c2.row_num - 1
Where   c1.event = 'a'
And     c2.event <> 'a'
于 2013-01-10T20:38:14.087 回答
0

我通常使用相关子查询进行“下一个”计算,然后连接回原始表。在这种情况下,我假设 , 唯一地定义了一行。

这是查询:

SELECT user_id, MAX(purchased) as purchased, spent, event
FROM (
    SELECT e.User_ID, e.created_at AS purchased, 
           MIN(enext.created_at) AS spent,
           min(enext.event) AS event 
    FROM (select e.*,
                 (select MIN(e2.created_at)
                  from event e2
                  where e2.user_id = e.user_id and e2.created_at > e.created_at and e2.event <> 'a'
                 ) nextcreatedat
          from events e
          where e.event = 'a'
         ) e left outer join
         events enext
         on e.user_id = enext.user_id and
            e.nextcreatedat = enext.create_at
    GROUP BY e.user_id, e.created_at
    ) e3
 GROUP BY user_id, spent;

聚合GROUP BY e.user_id, e.created_at不是必需的,但我将其保留为与原始查询相似。

因为 Vertica 支持累积和,所以有一种方法可以更有效地执行此操作,但它在 MySQL 中不起作用。

于 2013-01-10T20:45:55.817 回答