没错,这里有一个不可避免的笛卡尔积。您可以将此问题分解为两个子查询:
一是收入:
SELECT p.id, p.name, SUM(i.amount) AS income_sum, SUM(number_of_hours_for_amount) AS work_hours_sum
FROM people p
LEFT JOIN income i ON p.id = i.person_id
GROUP BY p.id;
+----+---------+------------+----------------+
| id | name | income_sum | work_hours_sum |
+----+---------+------------+----------------+
| 1 | Groucho | 20.00 | 20 |
| 2 | Harpo | 40.00 | 40 |
| 3 | Chico | 60.00 | 60 |
+----+---------+------------+----------------+
这是该查询的解释:
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using temporary; Using filesort |
| 1 | SIMPLE | i | ALL | NULL | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
一是费用:
SELECT p.id, SUM(e.amount) AS expenses_sum, SUM(number_of_items_bought) AS items_count
FROM people p
LEFT JOIN expenses e ON p.id = e.person_id
GROUP BY p.id;
+----+--------------+-------------+
| id | expenses_sum | items_count |
+----+--------------+-------------+
| 1 | 30.00 | 4 |
| 2 | 30.00 | 4 |
| 3 | 30.00 | 4 |
+----+--------------+-------------+
这是解释:
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using temporary; Using filesort |
| 1 | SIMPLE | e | ALL | NULL | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
我们在上面的 EXPLAIN 报告中看到查询在收入和支出表上使用表扫描(类型“ALL”),并且在没有索引的情况下连接(“使用连接缓冲区”)。危险信号是您有两个表参与连接,其中两个表都使用访问类型“ALL”。如果您在这些表中的行数过多,则会变得非常昂贵。它通常与“使用连接缓冲区”一起使用,这是昂贵查询的另一个危险信号。
最后,它使用临时表和文件排序来低效地执行 GROUP BY。这是另一个性能杀手。
块嵌套循环是 MySQL 5.6 的东西。如果您使用早期版本的 MySQL,您将看不到这一点。
以下索引应该有助于使这些查询更好:
ALTER TABLE income ADD KEY (person_id, amount, number_of_hours_for_amount);
ALTER TABLE expenses ADD KEY (person_id, amount, number_of_items_bought);
现在 EXPLAIN 报告不再显示低效访问。连接是通过索引(类型“ref”)完成的,临时表和文件排序消失了。“使用索引”表示它仅通过索引中的列访问连接的表,根本不需要触摸表行。
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
| 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 4 | NULL | 3 | NULL |
| 1 | SIMPLE | i | ref | person_id | person_id | 5 | test.p.id | 1 | Using index |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
| 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 4 | NULL | 3 | NULL |
| 1 | SIMPLE | e | ref | person_id | person_id | 5 | test.p.id | 1 | Using index |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
你说你想在一个查询中做到这一点,所以这里是如何做到的:
我们可以将这两个单独的查询合并为一个查询,以在每人一行中获得结果:
SELECT name, income_sum, work_hours_sum, expenses_sum, items_count
FROM
(SELECT p.id, p.name, SUM(i.amount) AS income_sum, SUM(number_of_hours_for_amount) AS work_hours_sum
FROM people p
LEFT OUTER JOIN income i ON p.id = i.person_id
GROUP BY p.id) AS subq_i
INNER JOIN
(SELECT p.id, SUM(e.amount) AS expenses_sum, SUM(number_of_items_bought) AS items_count
FROM people p
LEFT OUTER JOIN expenses e ON p.id = e.person_id
GROUP BY p.id) AS subq_e
USING (id);
+---------+------------+----------------+--------------+-------------+
| name | income_sum | work_hours_sum | expenses_sum | items_count |
+---------+------------+----------------+--------------+-------------+
| Groucho | 20.00 | 20 | 30.00 | 4 |
| Harpo | 40.00 | 40 | 30.00 | 4 |
| Chico | 60.00 | 60 | 30.00 | 4 |
+---------+------------+----------------+--------------+-------------+
即使对于这个连接查询,EXPLAIN 看起来也不是那么糟糕。没有临时表或文件排序或连接缓冲区,并且很好地使用了覆盖索引。
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | NULL |
| 1 | PRIMARY | <derived3> | ref | <auto_key0> | <auto_key0> | 4 | subq_i.id | 2 | NULL |
| 3 | DERIVED | p | index | PRIMARY | PRIMARY | 4 | NULL | 3 | Using index |
| 3 | DERIVED | e | ref | person_id | person_id | 5 | test.p.id | 1 | Using index |
| 2 | DERIVED | p | index | PRIMARY | PRIMARY | 4 | NULL | 3 | NULL |
| 2 | DERIVED | i | ref | person_id | person_id | 5 | test.p.id | 1 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+