1

我有两个表,如下(从实际简化):

mysql> desc small_table;
+-----------------+---------------+------+-----+-- --------+-------+
| 领域 | 类型 | 空 | 钥匙 | 默认 | 额外 |
+-----------------+---------------+------+-----+-- --------+-------+
| 事件时间 | 日期时间 | 否 | | 空 | |
| 用户 ID | 字符(15) | 否 | | 空 | |
| 其他数据 | 整数(11) | 否 | 穆尔 | 空 | |
+-----------------+---------------+------+-----+-- --------+-------+
3 行一组(0.00 秒)

mysql> desc large_table;
+-----------------+---------------+------+-----+-- --------+-------+
| 领域 | 类型 | 空 | 钥匙 | 默认 | 额外 |
+-----------------+---------------+------+-----+-- --------+-------+
| 事件时间 | 日期时间 | 否 | | 空 | |
| 用户 ID | 字符(15) | 否 | | 空 | |
| 其他数据 | 整数(11) | 否 | | 空 | |
+-----------------+---------------+------+-----+-- --------+-------+
3 行一组(0.00 秒)

现在,small_table嗯,很小:对于每个user_id通常只有一排(尽管有时会有更多)。large_table另一方面,在 中,每个都出现user_id了无数次。

mysql> 从 small_table\G 中选择 count(1)
****************************** 1. 行 ************************ *******
计数(1):20182
一组中的 1 行(0.00 秒)


mysql> 从 large_table\G 中选择 count(1)
****************************** 1. 行 ************************ *******
计数(1):2870522
一组中的 1 行(0.00 秒)

但是,这很重要,对于 中的每一行small_table,至少有一行large_table具有相同user_id、相同other_data和相似event_time(例如,几分钟内相同)。

我想知道是否small_table有对应于第一行、第二行或相同和相似的任何不同行的。也就是说,我想:large_tableuser_idevent_time

  1. 对于每个,按顺序排列user_id的不同行的计数,但仅限于例如三个小时内;也就是说,我只寻找相隔三小时之内的行数;和large_tableevent_timeevent_timeevent_time
  2. 对于每个这样的不同行集合,该列表中哪一行的标识(按顺序event_time)在small_table.

我似乎无法编写一个可以完成第一步的查询,更不用说一个可以完成第二步的查询了,并且希望得到任何指导。

4

3 回答 3

1

为此所需的 SQL 是残酷的;它会给你的优化器一个非常严肃的锻炼。

从问题和问题之后的评论来看,如果它们都落在相邻事件之间的某个固定间隔内,则希望将大表中给定用户 ID 的事件序列视为“连续的”。例如,固定间隔为 3 小时。我正在为 IBM Informix Dynamic Server 编码(为了争论,版本 11.70,但 11.50 也可以正常工作)。这意味着我需要解释一个特殊的符号。具体来说,符号3 UNITS HOUR表示3小时的间隔;它也可以用INTERVAL(3) HOUR TO HOURSQL 的 Informix 方言或INTERVAL '3' HOUR标准 SQL 编写。

生成 SQL 有几个关键技术,尤其是复杂的 SQL。一种是逐步、零碎地构建 SQL,组装最终结果。另一个是确保你清楚地说明你所追求的。

在下面的符号中,“for the same User_ID”的限定应被视为始终是表达式的一部分。

在大表中,在加入小表之前,我们要考虑三类范围。

  1. 既没有足够接近的事件之前的事件时间行​​也没有足够接近的事件之后的事件时间的行的条目。这是一个开始和结束时间相同的时间范围。
  2. 表中的一对条目,它们本身足够接近,但既没有事件早于足够接近的对的早期事件,也没有事件晚于足够接近的对的晚期事件,也没有事件在这对之间。
  3. 表中三个或更多条目的序列,其中有:
    • 没有事件 E1 早于足够接近的最早事件
    • 没有事件 E2 晚于足够接近的最新事件
    • 晚于最早的事件 E3 与最早的足够接近
    • 一个比最新事件更早的事件 E4 与最新事件足够接近(E4 可能与 E3 事件相同)
    • 最早和最晚之间没有一对事件 E5、E6,其中 E5 和 E6 之间没有事件,但 E5 和 E6 之间的差距太大而无法计算。

从描述中可以看出,这将是一些可怕的 SQL!

注意:代码现在已经过测试;一些(主要是小的)改变是必要的。一项不必要的小改动是在中间查询中添加了 ORDER BY 子句。另一个不必要的更改是从小表中选择其他数据进行验证。此修订是在没有研究msh210发布的修订版本的情况下进行的。

另请注意,我不确定这是一个最小的公式;使用单个 SELECT 语句而不是三个 SELECT 语句的 UNION 对所有范围进行分类可能是可行的(如果是这样的话会很好)。

单例范围

-- Ranges of exactly 1 event
SELECT lt1.user_id, lt1.event_time AS min_time, lt1.event_time AS max_time
  FROM Large_Table AS lt1
 WHERE NOT EXISTS -- an earlier event that is close enough
       (SELECT *
          FROM Large_Table AS lt3
         WHERE lt1.user_id = lt3.user_id
           AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
           AND lt3.event_time < lt1.event_time
       )
   AND NOT EXISTS -- a later event that is close enough
       (SELECT *
          FROM Large_Table AS lt4
         WHERE lt1.user_id = lt4.user_id
           AND lt4.event_time > lt1.event_time
           AND lt4.event_time < lt1.event_time + 3 UNITS HOUR
       )
ORDER BY User_ID, Min_Time;

道尔顿山脉

-- Ranges of exactly 2 events
SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
  FROM Large_Table AS lt1
  JOIN Large_Table AS lt2
    ON lt1.user_id = lt2.user_id
   AND lt1.event_time < lt2.event_time
   AND lt2.event_time < lt1.event_time + 3 UNITS HOUR
 WHERE NOT EXISTS -- an earlier event that is close enough
       (SELECT *
          FROM Large_Table AS lt3
         WHERE lt1.user_id = lt3.user_id
           AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
           AND lt3.event_time < lt1.event_time
       )
   AND NOT EXISTS -- a later event that is close enough
       (SELECT *
          FROM Large_Table AS lt4
         WHERE lt1.user_id = lt4.user_id
           AND lt4.event_time > lt2.event_time
           AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
       )
   AND NOT EXISTS -- any event in between
       (SELECT *
          FROM Large_Table AS lt5
         WHERE lt1.user_id = lt5.user_id
           AND lt5.event_time > lt1.event_time
           AND lt5.event_time < lt2.event_time
       )
ORDER BY User_ID, Min_Time;

在外部 WHERE 子句中添加了 3 小时标准。

多个事件范围

-- Ranges of 3 or more events
SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
  FROM Large_Table AS lt1
  JOIN Large_Table AS lt2
    ON lt1.user_id = lt2.user_id
   AND lt1.event_time < lt2.event_time
 WHERE NOT EXISTS -- an earlier event that is close enough
       (SELECT *
          FROM Large_Table AS lt3
         WHERE lt1.user_id = lt3.user_id
           AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
           AND lt3.event_time < lt1.event_time
       )
   AND NOT EXISTS -- a later event that is close enough
       (SELECT *
          FROM Large_Table AS lt4
         WHERE lt1.user_id = lt4.user_id
           AND lt4.event_time > lt2.event_time
           AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
       )
   AND NOT EXISTS -- a gap that's too big in the events between first and last
       (SELECT *
          FROM Large_Table AS lt5 -- E5 before E6
          JOIN Large_Table AS lt6
            ON lt5.user_id = lt6.user_id
           AND lt5.event_time < lt6.event_time
         WHERE lt1.user_id = lt5.user_id
           AND lt6.event_time < lt2.event_time
           AND lt5.event_time > lt1.event_time
           AND (lt6.event_time - lt5.event_time) > 3 UNITS HOUR
           AND NOT EXISTS -- an event in between these two
               (SELECT *
                  FROM Large_Table AS lt9
                 WHERE lt5.user_id = lt9.user_id
                   AND lt9.event_time > lt5.event_time
                   AND lt9.event_time < lt6.event_time
               )
       )
   AND EXISTS -- an event close enough after the start
       (SELECT *
          FROM Large_Table AS lt7
         WHERE lt1.user_id = lt7.user_id
           AND lt1.event_time < lt7.event_time
           AND lt7.event_time < lt1.event_time + 3 UNITS HOUR
           AND lt7.event_time < lt2.event_time
       )
   AND EXISTS -- an event close enough before the end
       (SELECT *
          FROM Large_Table AS lt8
         WHERE lt2.user_id = lt8.user_id
           AND lt2.event_time > lt8.event_time
           AND lt8.event_time > lt2.event_time - 3 UNITS HOUR
           AND lt8.event_time > lt1.event_time
       )
ORDER BY User_ID, Min_Time;

在“big gaps”子查询中添加了省略的嵌套 NOT EXISTS 子句。

大表中的所有范围

显然,最后一个表中的完整范围列表是上述三个查询的并集。

查询因为不够有趣而被删除。它只是上面单独查询的 3 路 UNION。

最终查询

最后的查询在与小表中的条目足够接近的可怕的 3 部分 UNION 的结果中找到范围(如果有)。例如,小表中的单个条目可能落在 13:00,大表中可能有一个范围在 11:00 结束,另一个从 15:00 开始。大表中的两个范围是分开的(它们之间的间隔是 4 小时),但小表中的条目足够接近,可以计算。[测试涵盖了这种情况。]

SELECT S.User_id, S.Event_Time, L.Min_Time, L.Max_Time, S.Other_Data
  FROM Small_Table AS S
  JOIN (
        -- Ranges of exactly 1 event
        SELECT lt1.user_id, lt1.event_time AS min_time, lt1.event_time AS max_time
          FROM Large_Table AS lt1
         WHERE NOT EXISTS -- an earlier event that is close enough
               (SELECT *
                  FROM Large_Table AS lt3
                 WHERE lt1.user_id = lt3.user_id
                   AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
                   AND lt3.event_time < lt1.event_time
               )
           AND NOT EXISTS -- a later event that is close enough
               (SELECT *
                  FROM Large_Table AS lt4
                 WHERE lt1.user_id = lt4.user_id
                   AND lt4.event_time > lt1.event_time
                   AND lt4.event_time < lt1.event_time + 3 UNITS HOUR
               )
        UNION
        -- Ranges of exactly 2 events
        SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
          FROM Large_Table AS lt1
          JOIN Large_Table AS lt2
            ON lt1.user_id = lt2.user_id
           AND lt1.event_time < lt2.event_time
           AND lt2.event_time < lt1.event_time + 3 UNITS HOUR
         WHERE NOT EXISTS -- an earlier event that is close enough
               (SELECT *
                  FROM Large_Table AS lt3
                 WHERE lt1.user_id = lt3.user_id
                   AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
                   AND lt3.event_time < lt1.event_time
               )
           AND NOT EXISTS -- a later event that is close enough
               (SELECT *
                  FROM Large_Table AS lt4
                 WHERE lt1.user_id = lt4.user_id
                   AND lt4.event_time > lt2.event_time
                   AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
               )
           AND NOT EXISTS -- any event in between
               (SELECT *
                  FROM Large_Table AS lt5
                 WHERE lt1.user_id = lt5.user_id
                   AND lt5.event_time > lt1.event_time
                   AND lt5.event_time < lt2.event_time
               )
        UNION
        -- Ranges of 3 or more events
        SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
          FROM Large_Table AS lt1
          JOIN Large_Table AS lt2
            ON lt1.user_id = lt2.user_id
           AND lt1.event_time < lt2.event_time
         WHERE NOT EXISTS -- an earlier event that is close enough
               (SELECT *
                  FROM Large_Table AS lt3
                 WHERE lt1.user_id = lt3.user_id
                   AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
                   AND lt3.event_time < lt1.event_time
               )
           AND NOT EXISTS -- a later event that is close enough
               (SELECT *
                  FROM Large_Table AS lt4
                 WHERE lt1.user_id = lt4.user_id
                   AND lt4.event_time > lt2.event_time
                   AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
               )
           AND NOT EXISTS -- a gap that's too big in the events between first and last
               (SELECT *
                  FROM Large_Table AS lt5 -- E5 before E6
                  JOIN Large_Table AS lt6
                    ON lt5.user_id = lt6.user_id
                   AND lt5.event_time < lt6.event_time
                 WHERE lt1.user_id = lt5.user_id
                   AND lt6.event_time < lt2.event_time
                   AND lt5.event_time > lt1.event_time
                   AND (lt6.event_time - lt5.event_time) > 3 UNITS HOUR
                   AND NOT EXISTS -- an event in between these two
                       (SELECT *
                          FROM Large_Table AS lt9
                         WHERE lt5.user_id = lt9.user_id
                           AND lt9.event_time > lt5.event_time
                           AND lt9.event_time < lt6.event_time
                       )
               )
           AND EXISTS -- an event close enough after the start
               (SELECT *
                  FROM Large_Table AS lt7
                 WHERE lt1.user_id = lt7.user_id
                   AND lt1.event_time < lt7.event_time
                   AND lt7.event_time < lt1.event_time + 3 UNITS HOUR
                   AND lt7.event_time < lt2.event_time
               )
           AND EXISTS -- an event close enough before the end
               (SELECT *
                  FROM Large_Table AS lt8
                 WHERE lt2.user_id = lt8.user_id
                   AND lt2.event_time > lt8.event_time
                   AND lt8.event_time > lt2.event_time - 3 UNITS HOUR
                   AND lt8.event_time > lt1.event_time
               )
       ) AS L
    ON S.User_ID = L.User_ID
 WHERE S.Event_Time > L.Min_Time - 3 UNITS HOUR
   AND S.Event_Time < L.Max_Time + 3 UNITS HOUR
ORDER BY User_ID, Event_Time, Min_Time;

OK - 公平警告;SQL 实际上并没有靠近 SQL DBMS。

代码现在已经过测试。极小的机会实际上是零。有一个语法错误和几个或多或少的小问题需要解决。

在设计了测试数据后,我分阶段进行了实验。我在验证和修复查询时使用了“Alpha”数据(见下文),并添加了 Beta 数据只是为了确保不同的 User_ID 值之间没有串扰。

我使用了显式<>操作而不是BETWEEN ... AND排除端点;如果您希望恰好相隔 3 小时的事件算作“足够接近”,那么您需要查看每个不等式,可能将它们更改为BETWEEN ... AND或可能仅使用>=<=酌情使用。

有一个松散相似但相当简单的问题的答案,(a)我写的和(b)对上面的复杂处理提供了一些有用的想法(特别是,'没有事件早但足够接近'和'没有事件稍后但接近“足够”的标准。“足够接近”的标准肯定会使这个问题复杂化。


测试数据

大桌子

CREATE TABLE Large_Table
(
    Event_Time  DATETIME YEAR TO MINUTE NOT NULL,
    User_ID     CHAR(15) NOT NULL,
    Other_Data  INTEGER NOT NULL,
    PRIMARY KEY(User_ID, Event_Time)
);

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 09:15', 'Alpha',  1) { R4 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 11:15', 'Alpha',  2) { R4 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 13:15', 'Alpha',  3) { R4 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 15:15', 'Alpha',  4) { R4 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 12:17', 'Beta',   1) { R4 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 09:15', 'Alpha',  5) { R1 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 10:17', 'Beta',   2) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 09:15', 'Alpha',  6) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 11:15', 'Alpha',  7) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 10:17', 'Beta',   3) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 09:15', 'Alpha',  8) { R3 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 11:15', 'Alpha',  9) { R3 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 13:15', 'Alpha', 10) { R3 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 10:17', 'Beta',   4) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 09:15', 'Alpha', 11) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 11:15', 'Alpha', 12) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 10:17', 'Beta',   5) { R1 };
{ Probe here }
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 15:15', 'Alpha', 13) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 17:15', 'Alpha', 14) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 16:17', 'Beta',   6) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 09:15', 'Alpha', 15) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 11:15', 'Alpha', 16) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 13:15', 'Alpha', 17) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 15:15', 'Alpha', 18) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 17:15', 'Alpha', 19) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 19:15', 'Alpha', 20) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 16:17', 'Beta',   7) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 09:15', 'Alpha', 21) { R1 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 11:17', 'Beta',   8) { R1 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 12:15', 'Alpha', 22) { R1 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 13:17', 'Beta',   9) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 09:15', 'Alpha', 23) { R5 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 11:15', 'Alpha', 24) { R5 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 13:15', 'Alpha', 25) { R5 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 15:15', 'Alpha', 26) { R5 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 17:15', 'Alpha', 27) { R5 };

小桌子

注意:为了测试的目的,小表实际上比大表包含更多的行。小表中 Other_Data 值大于 100 的行不应出现在结果中(也不应该出现)。这里的测试确实戳到了边缘条件。

CREATE TABLE Small_Table
(
    Event_Time  DATETIME YEAR TO MINUTE NOT NULL,
    User_ID     CHAR(15) NOT NULL,
    Other_Data  INTEGER NOT NULL,
    PRIMARY KEY(User_ID, Event_Time)
);

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 06:15', 'Alpha', 131) { XX };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 06:20', 'Alpha',  31) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 10:20', 'Alpha',  32) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 13:20', 'Alpha',  33) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 15:20', 'Alpha',  34) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 18:15', 'Alpha', 134) { XX };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 06:15', 'Alpha', 135) { XX };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 06:16', 'Alpha',  35) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 10:20', 'Alpha',  35) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 12:14', 'Alpha',  35) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 12:15', 'Alpha', 135) { XX };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 09:20', 'Alpha',  36) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 11:20', 'Alpha',  37) { YY };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 09:20', 'Alpha',  38) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 11:20', 'Alpha',  39) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 13:20', 'Alpha',  40) { YY };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 09:20', 'Alpha',  41) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 11:20', 'Alpha',  42) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 13:20', 'Alpha',  42) { 22 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 15:20', 'Alpha',  43) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 17:20', 'Alpha',  44) { YY };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 09:20', 'Alpha',  45) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 11:20', 'Alpha',  46) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 13:20', 'Alpha',  47) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 15:20', 'Alpha',  48) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 17:20', 'Alpha',  49) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 19:20', 'Alpha',  50) { YY };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 09:20', 'Alpha',  51) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 10:20', 'Alpha',  51) { 22 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 12:20', 'Alpha',  52) { YY };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 09:20', 'Alpha',  53) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 11:20', 'Alpha',  54) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 13:20', 'Alpha',  55) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 15:20', 'Alpha',  56) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 17:20', 'Alpha',  57) { YY };


INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 13:27', 'Beta',   9) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 11:27', 'Beta',   8) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 16:27', 'Beta',   7) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 16:27', 'Beta',   6) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 10:27', 'Beta',   5) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 10:27', 'Beta',   4) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 10:27', 'Beta',   3) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 10:27', 'Beta',   2) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 12:27', 'Beta',   1) { R4 };

最终查询结果

使用上面的数据,得到的结果是:

Alpha   2012-01-01 06:20   2012-01-01 09:15   2012-01-01 15:15   31
Alpha   2012-01-01 10:20   2012-01-01 09:15   2012-01-01 15:15   32
Alpha   2012-01-01 13:20   2012-01-01 09:15   2012-01-01 15:15   33
Alpha   2012-01-01 15:20   2012-01-01 09:15   2012-01-01 15:15   34
Alpha   2012-01-02 06:16   2012-01-02 09:15   2012-01-02 09:15   35
Alpha   2012-01-02 10:20   2012-01-02 09:15   2012-01-02 09:15   35
Alpha   2012-01-02 12:14   2012-01-02 09:15   2012-01-02 09:15   35
Alpha   2012-01-03 09:20   2012-01-03 09:15   2012-01-03 11:15   36
Alpha   2012-01-03 11:20   2012-01-03 09:15   2012-01-03 11:15   37
Alpha   2012-01-04 09:20   2012-01-04 09:15   2012-01-04 13:15   38
Alpha   2012-01-04 11:20   2012-01-04 09:15   2012-01-04 13:15   39
Alpha   2012-01-04 13:20   2012-01-04 09:15   2012-01-04 13:15   40
Alpha   2012-01-05 09:20   2012-01-05 09:15   2012-01-05 11:15   41
Alpha   2012-01-05 11:20   2012-01-05 09:15   2012-01-05 11:15   42
Alpha   2012-01-05 13:20   2012-01-05 09:15   2012-01-05 11:15   42
Alpha   2012-01-05 13:20   2012-01-05 15:15   2012-01-05 17:15   42
Alpha   2012-01-05 15:20   2012-01-05 15:15   2012-01-05 17:15   43
Alpha   2012-01-05 17:20   2012-01-05 15:15   2012-01-05 17:15   44
Alpha   2012-01-06 09:20   2012-01-06 09:15   2012-01-06 19:15   45
Alpha   2012-01-06 11:20   2012-01-06 09:15   2012-01-06 19:15   46
Alpha   2012-01-06 13:20   2012-01-06 09:15   2012-01-06 19:15   47
Alpha   2012-01-06 15:20   2012-01-06 09:15   2012-01-06 19:15   48
Alpha   2012-01-06 17:20   2012-01-06 09:15   2012-01-06 19:15   49
Alpha   2012-01-06 19:20   2012-01-06 09:15   2012-01-06 19:15   50
Alpha   2012-01-07 09:20   2012-01-07 09:15   2012-01-07 09:15   51
Alpha   2012-01-07 09:20   2012-01-07 12:15   2012-01-07 12:15   51
Alpha   2012-01-07 10:20   2012-01-07 09:15   2012-01-07 09:15   51
Alpha   2012-01-07 10:20   2012-01-07 12:15   2012-01-07 12:15   51
Alpha   2012-01-07 12:20   2012-01-07 12:15   2012-01-07 12:15   52
Alpha   2012-01-08 09:20   2012-01-08 09:15   2012-01-08 17:15   53
Alpha   2012-01-08 11:20   2012-01-08 09:15   2012-01-08 17:15   54
Alpha   2012-01-08 13:20   2012-01-08 09:15   2012-01-08 17:15   55
Alpha   2012-01-08 15:20   2012-01-08 09:15   2012-01-08 17:15   56
Alpha   2012-01-08 17:20   2012-01-08 09:15   2012-01-08 17:15   57
Beta    2012-01-01 12:27   2012-01-01 12:17   2012-01-01 12:17    1
Beta    2012-01-02 10:27   2012-01-02 10:17   2012-01-02 10:17    2
Beta    2012-01-03 10:27   2012-01-03 10:17   2012-01-03 10:17    3
Beta    2012-01-04 10:27   2012-01-04 10:17   2012-01-04 10:17    4
Beta    2012-01-05 10:27   2012-01-05 10:17   2012-01-05 10:17    5
Beta    2012-01-05 16:27   2012-01-05 16:17   2012-01-05 16:17    6
Beta    2012-01-06 16:27   2012-01-06 16:17   2012-01-06 16:17    7
Beta    2012-01-07 11:27   2012-01-07 11:17   2012-01-07 13:17    8
Beta    2012-01-07 13:27   2012-01-07 11:17   2012-01-07 13:17    9

中间结果

效果略有不同的格式。

单例范围

Alpha|2012-01-02 09:15|2012-01-02 09:15
Alpha|2012-01-07 09:15|2012-01-07 09:15
Alpha|2012-01-07 12:15|2012-01-07 12:15
Beta|2012-01-01 12:17|2012-01-01 12:17
Beta|2012-01-02 10:17|2012-01-02 10:17
Beta|2012-01-03 10:17|2012-01-03 10:17
Beta|2012-01-04 10:17|2012-01-04 10:17
Beta|2012-01-05 10:17|2012-01-05 10:17
Beta|2012-01-05 16:17|2012-01-05 16:17
Beta|2012-01-06 16:17|2012-01-06 16:17

道尔顿山脉

Alpha|2012-01-03 09:15|2012-01-03 11:15
Alpha|2012-01-05 09:15|2012-01-05 11:15
Alpha|2012-01-05 15:15|2012-01-05 17:15
Beta|2012-01-07 11:17|2012-01-07 13:17

多个事件范围

Alpha|2012-01-01 09:15|2012-01-01 15:15
Alpha|2012-01-04 09:15|2012-01-04 13:15
Alpha|2012-01-06 09:15|2012-01-06 19:15
Alpha|2012-01-08 09:15|2012-01-08 17:15
于 2012-01-15T07:04:57.770 回答
0
select count(s.user_id), s.event_time, s.other_data from small_table s
where s.user_id IN (select distinct user_id from big_table where event_time between 'StartDate' and 'EndDate')
order by s.event_time

我不确定您在提到的小范围内要求什么。

还:

select * from large_table t1, large_table t2 
where t1.event_time <= date_sub(t2.event_time, INTERVAL 3 hour)

所以,试试:

  select count(s.user_id), s.event_time, s.other_data from small_table s
    where s.user_id IN ( select * from large_table t1, large_table t2 
    where t1.event_time <= date_sub(t2.event_time, INTERVAL 3 hour))
order by s.event_time
于 2012-01-13T17:55:16.327 回答
0

这也许应该是对Jonathan Leffler详细而有用的答案的评论,但是(a)它太长了,(b)它确实有助于回答我的问题,所以我将它作为答案发布。

Jonathan Leffler 的答案中标题为“Multiple Event Ranges”的代码找到了第二个实例在第一个实例之后不久,倒数第二个实例在最后一个实例之前的范围,并且没有出现大的中断,但是内部实例之间没有任何大的差距,甚至如果他们之间有其他情况。因此,例如,如果限制为 3 小时,则 1、2、4、6 和 7 处的实例将被禁止,因为 2 和 6 之间的差距。我认为正确的代码应该是(直接建立在 Jonathan Leffler 的):

SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
  FROM Large_Table AS lt1
  JOIN Large_Table AS lt2
    ON lt1.user_id = lt2.user_id
   AND lt1.event_time < lt2.event_time
 WHERE NOT EXISTS -- an earlier event that is close enough
       (SELECT *
          FROM Large_Table AS lt3
         WHERE lt1.user_id = lt3.user_id
           AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
           AND lt3.event_time < lt1.event_time
       )
   AND NOT EXISTS -- a later event that is close enough
       (SELECT *
          FROM Large_Table AS lt4
         WHERE lt1.user_id = lt4.user_id
           AND lt4.event_time > lt2.event_time
           AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
       )
   AND NOT EXISTS -- a gap that's too big in the events between first and last
       (SELECT *
          FROM Large_Table AS lt5 -- E5 before E6
          JOIN Large_Table AS lt6
            ON lt5.user_id = lt6.user_id
           AND lt1.user_id = lt5.user_id
           AND lt5.event_time < lt6.event_time
           AND lt6.event_time <= lt2.event_time
           AND lt5.event_time >= lt1.event_time
           AND (lt6.event_time - lt5.event_time) > 3 UNITS HOUR
           and not exists (
             select * from large_table as lt9 
               where lt9.event_time > lt5.event_time
                 and lt6.event_time > lt9.event_time
             )
       )

这消除了and existsJonathan Leffler 的答案中标题为“Multiple Event Ranges”的代码中最后两个 s 的需要,并且实际上消除了他的答案中对“Singleton Ranges”和“Doubleton Ranges”代码的需要。

除非我错过了什么。

于 2012-01-15T18:23:43.943 回答