mysql - mysql有效地将2个表连接到相同的2个表

Question

我有 2 个表可以简化为这种结构：

表格1：

+----+----------+---------------------+-------+
| id | descr_id |        date         | value |
+----+----------+---------------------+-------+
| 1  |        1 | 2013-09-20 16:39:06 |     1 |
+----+----------+---------------------+-------+
| 2  |        2 | 2013-09-20 16:44:06 |     1 |
+----+----------+---------------------+-------+
| 3  |        3 | 2013-09-20 16:49:06 |     5 |
+----+----------+---------------------+-------+
| 4  |        4 | 2013-09-20 16:44:06 |   894 |
+----+----------+---------------------+-------+

表 2：

+----------+-------------+
| descr_id | description |
+----------+-------------+
|       1  | abc         |
+----------+-------------+
|       2  | abc         |
+----------+-------------+
|       3  | abc         |
+----------+-------------+
|       4  | DEF         |
+----------+-------------+

我想将描述加入table1，按描述过滤，所以我只得到描述= abc的行，并过滤掉“重复”行，如果两行具有相同的值并且它们的日期在6分钟内，则两行是重复的其他。我想要的输出表如下（假设 abc 是想要的描述过滤器）。

+----+----------+---------------------+-------+-------------+
| id | descr_id |        date         | value | description |
+----+----------+---------------------+-------+-------------+
| 1  |        1 | 2013-09-20 16:39:06 |     1 | abc         |
+----+----------+---------------------+-------+-------------+
| 3  |        3 | 2013-09-20 16:49:06 |     5 | abc         |
+----+----------+---------------------+-------+-------------+

我提出的查询是：

select * 
  from (
        select * 
          from table1 
          join table2 using(descr_id) 
         where label='abc'
       ) t1 
  left join (
        select * 
          from table1 
          join table2 using(descr_id) 
         where label='abc'
        ) t2 on( t1.date<t2.date and t1.date + interval 6 minute > t2.date) 
 where t1.value=t2.value.

不幸的是，这个查询在我的数据集上运行需要一分钟，并且没有返回任何结果（尽管我相信应该有结果）。有没有更有效的方法来执行这个查询？有没有办法命名派生表并稍后在同一查询中引用它？另外，为什么我的查询没有返回结果？

提前感谢您的帮助！

编辑：我想保留几个时间戳接近的样本中的第一个。

我的 table1 有 610 万行，我的 table2 有 30K，这让我意识到 table2 只有一行用于描述“abc”。这意味着我可以事先查询 descr_id，然后使用该 id 来避免在大查询中加入 table2，从而提高效率。但是，如果我的 table2 的设置如上所述（我承认这将是糟糕的数据库设计），那么执行此类查询的好方法是什么？

score 1 · Accepted Answer

尝试创建临时表并加入临时表：

CREATE TEMPORARY TABLE t1 AS (select * 
          FROM table1 
          JOIN table2 USING(descr_id) 
         WHERE label='abc')

CREATE TEMPORARY TABLE t2 AS (select * 
          FROM table1 
          JOIN table2 USING(descr_id) 
         WHERE label='abc')

SELECT *
FROM t1
LEFT JOIN t2 on( t1.date<t2.date and t1.date + interval 6 minute > t2.date) 
WHERE t1.value=t2.value

与数据库断开连接后，临时表会自动清理，因此无需显式删除它们。

我最初有这个，但我不相信它达到了全部要求：

SELECT t1.id,
       t1.descr_id,
       t1.date,
       t1.value,
       t2.description
FROM table1 t1
JOIN table2 t2 ON t1.descr_id = t2.descr_id
WHERE t2.description = 'abc'

这与原始查询基本相同，但是另一种选择可能是创建一个视图并加入视图，如下所示：

CREATE VIEW v1 AS
SELECT * FROM table1 JOIN table2 USING(descr_id) WHERE label='abc'

CREATE VIEW v2 AS
SELECT * FROM table1 JOIN table2 USING(descr_id) WHERE label='abc'

SELECT *
FROM v1
LEFT JOIN v2 on( v1.date<v2.date and v1.date + interval 6 minute > v2.date) 
WHERE v1.value=v2.value

此外，如果您定期运行此查询，您可能会考虑将第一个查询的结果加载到临时表中，并像这样在临时表上进行连接：

INSERT INTO staging
(SELECT * 
        FROM table1 
        JOIN table2 USING(descr_id) 
        WHERE label='abc')

SELECT *
    FROM staging s1
    LEFT JOIN staging s2 on( s1.date<s2.date and s1.date + interval 6 minute > s2.date) 
    WHERE s1.value=s2.value

TRUNCATE TABLE staging

score 0 · Accepted Answer

尝试使用不存在之类的东西 select * from table1 t1 join table2 t2 using(descr_id) where label='abc' and not exists (select * from table1 t11 join table2 t22 using(descr_id) where label='abc' and t1.日期 < t11.date 和 t1.date + 间隔 6 分钟 > t11.date)

您可能需要仔细检查 (t1.date + interval 6 minute) 语法

mysql - mysql有效地将2个表连接到相同的2个表

2 回答 2

Related

Reference