mysql - 没有附加列的子查询比有列的子查询花费更长的时间

Question

我正在尝试使用子查询获得运行总数。（我正在使用 Metabase，它似乎不接受/处理查询中的变量）

我的查询：

SELECT date_format(t.`session_stop`, '%d') AS `session_stop`, 
    sum(t.`energy_used` / 1000) AS `csum`,
    (
      SELECT (SUM(a.`energy_used`) / 1000)
      FROM `sessions` a 
      WHERE date_format(a.`session_stop`, '%Y-%m-%d') <=  date_format(t.`session_stop`, '%Y-%m-%d') 
      AND str_to_date(concat(date_format(a.`session_stop`, '%Y-%m'), '-01'), '%Y-%m-%d') = str_to_date(concat(date_format(now(), '%Y-%m'), '-01'), '%Y-%m-%d')
      ORDER BY str_to_date(date_format(a.`session_stop`, '%e'), '%d') ASC
    ) AS `sum`
    FROM `sessions` t
    WHERE str_to_date(concat(date_format(t.`session_stop`, '%Y-%m'), '-01'), '%Y-%m-%d') = str_to_date(concat(date_format(now(), '%Y-%m'), '-01'), '%Y-%m-%d')
    GROUP BY date_format(t.`session_stop`, '%e')
    ORDER BY str_to_date(date_format(t.`session_stop`, '%d'), '%d') ASC;

运行大约需要 1.29 秒。（总共 43K 行，返回 14）

如果我删除该sum(t.`energy_used` / 1000) AS `csum`,行，查询将占用 8 分 40 秒。

为什么是这样？我宁愿没有那条线，但我也不能等待 8 分钟来处理查询。

（我知道我可以创建一个累积列，但我特别感兴趣的是为什么这个额外sum()的速度会加快整个查询）

附言。在 MySQL 控制台和 Metabase 界面上对此进行了测试。

解释查询：

+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------
| id | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------
|  1 | PRIMARY            | t     | ALL  | NULL          | NULL | NULL    | NULL | 42055 | Using where; Using tempora
|  2 | DEPENDENT SUBQUERY | a     | ALL  | NULL          | NULL | NULL    | NULL | 42055 | Using where
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------
2 rows in set (0.00 sec)

没有额外的sum()：

+----+--------------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
| id | select_type        | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                        |
+----+--------------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
|  1 | PRIMARY            | t     | ALL  | NULL          | NULL | NULL    | NULL | 44976 | Using where; Using temporary; Using filesort |
|  2 | DEPENDENT SUBQUERY | a     | ALL  | NULL          | NULL | NULL    | NULL | 44976 | Using where                                  |
+----+--------------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
2 rows in set (0.00 sec)

Schema 只不过是一个包含以下内容的表：

session_id (INT, auto incr., prim.key) | session_stop (datetime) | energy_used (INT) |
 1                           | 1-1-2016 10:00:00       | 123456            |
 2                           | 1-1-2016 10:05:00       | 123456            |
 3                           | 1-2-2016 10:10:00       | 123456            |
 4                           | 1-2-2016 12:00:00       | 123456            |
 5                           | 3-3-2016 14:05:00       | 123456            |

互联网上的一些示例显示将 ID 用于 WHERE 子句，但我的结果很差。

score 1 · Accepted Answer

您的查询根本不相似。事实上，他们是天壤之别。

如果我删除 sum(t. energy_used/ 1000) AScsum行，则查询需要 8 分 40 秒。

当您使用 SUM 时，它是一个聚合。sum(t.energy_used/ 1000)将产生与仅选择完全不同的结果，t.energy_used这就是查询时间存在如此巨大差异的原因。

也很不清楚您为什么以这种方式比较日期：

WHERE date_format(a.`session_stop`, '%Y-%m-%d') <=      date_format(t.`session_stop`, '%Y-%m-%d')

为什么在比较之前将它们都转换为 date_format ？由于两个表显然包含相同的数据类型，因此对于这两种情况，您应该能够a.session_stop <= t.session_stop更快地执行此操作。

由于它是不等式比较，因此它不是索引的理想候选者，但您仍然可以尝试在该列上创建索引以查看它是否有任何效果。

回顾一下，性能差异是因为您不仅仅是添加/删除额外的列，而是添加/删除聚合。

mysql - 没有附加列的子查询比有列的子查询花费更长的时间

1 回答 1

Related

Reference