2

我正在使用 MYSQL。我有三张桌子。

  1. people由两列组成的表:

    • id - 表中的主键
    • name - 人名
  2. income表,其中包含来自 people 表的人们的收入。此表中的每条记录代表一个人的收入。在这张表中,一个人可能有零个或多个收入。表结构为:

    • person_id(“人”表的外键)
    • 金额(DECIMAL 类型 - 金额)
    • number_of_hours_for_amount(整数类型 - 赚取此收入所需的小时数)
  3. expenses包含人们开支的表格。此表中的每条记录代表一个人的一项支出,以及他在该单项支出中购买了多少物品。一个人在此表中可能有零个或多个费用记录。表结构为:

    • person_id(“人”表的外键)
    • 金额(DECIMAL 类型的金额)
    • number_of_items_bought(整数类型 - 在此费用中购买的物品数量)

我想要做的是创建一个单一的查询,它会给我一个所有人的列表(每人一条记录),并且在每一行中我都会有

  • 该人的姓名,
  • 他所有收入的总和,
  • 他工作的总小时数,
  • 他所有开支的总和,
  • 他购买的物品总数。

我尝试的第一种天真的方法在逻辑上运行得很好,但性能很差,它看起来像这样:

SELECT name, income_sum, work_hours_sum, expenses_sum, items_count
FROM (people
      LEFT JOIN 
           (SELECT person_id, sum(amount) as income_sum, 
                   sum(number_of_hours_for_amount) as work_hours_sum
            FROM income
            GROUP BY person_id) as income_subquery
      ON people.id = income_subquery.person_id)

LEFT JOIN
     (SELECT person_id, sum(amount) as expenses_sum, 
             sum(number_of_items_bought) as items_count
      FROM expenses
      GROUP BY person_id) as income_subquery
ON people.id = income_subquery.person_id

据我所知,这个查询的问题是,一旦我从子查询中获取数据 - 连接的效率就非常低,因为这些表上的索引没有得到很好的利用,因为它们是临时子查询表。

充分利用现有索引的最佳方法是直接在三个表之间完成连接,而不是通过子查询。但这不是一个正确的解决方案,因为它会创建一个笛卡尔积,它将重复的值添加到来自记录的聚合总和中,这些记录看起来比它们应该出现的要多。

(我尝试的另一个选项是将每个人的收入和支出值计算为 SELECT 部分(相关子查询)中的 select_expressions。这也不够快)

我正在寻找一个有效的查询并给我这些结果。

4

4 回答 4

3

没错,这里有一个不可避免的笛卡尔积。您可以将此问题分解为两个子查询:

一是收入:

SELECT p.id, p.name, SUM(i.amount) AS income_sum, SUM(number_of_hours_for_amount) AS work_hours_sum
FROM people p
LEFT JOIN income i ON p.id = i.person_id
GROUP BY p.id;

+----+---------+------------+----------------+
| id | name    | income_sum | work_hours_sum |
+----+---------+------------+----------------+
|  1 | Groucho |      20.00 |             20 |
|  2 | Harpo   |      40.00 |             40 |
|  3 | Chico   |      60.00 |             60 |
+----+---------+------------+----------------+

这是该查询的解释:

+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                              |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
|  1 | SIMPLE      | p     | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 | Using temporary; Using filesort                    |
|  1 | SIMPLE      | i     | ALL  | NULL          | NULL | NULL    | NULL |    6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+

一是费用:

SELECT p.id, SUM(e.amount) AS expenses_sum, SUM(number_of_items_bought) AS items_count
FROM people p
LEFT JOIN expenses e ON p.id = e.person_id
GROUP BY p.id;

+----+--------------+-------------+
| id | expenses_sum | items_count |
+----+--------------+-------------+
|  1 |        30.00 |           4 |
|  2 |        30.00 |           4 |
|  3 |        30.00 |           4 |
+----+--------------+-------------+

这是解释:

+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                              |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
|  1 | SIMPLE      | p     | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 | Using temporary; Using filesort                    |
|  1 | SIMPLE      | e     | ALL  | NULL          | NULL | NULL    | NULL |    6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+

我们在上面的 EXPLAIN 报告中看到查询在收入和支出表上使用表扫描(类型“ALL”),并且在没有索引的情况下连接(“使用连接缓冲区”)。危险信号是您有两个表参与连接,其中两个表都使用访问类型“ALL”。如果您在这些表中的行数过多,则会变得非常昂贵。它通常与“使用连接缓冲区”一起使用,这是昂贵查询的另一个危险信号。

最后,它使用临时表和文件排序来低效地执行 GROUP BY。这是另一个性能杀手。

块嵌套循环是 MySQL 5.6 的东西。如果您使用早期版本的 MySQL,您将看不到这一点。

以下索引应该有助于使这些查询更好:

ALTER TABLE income ADD KEY (person_id, amount, number_of_hours_for_amount);
ALTER TABLE expenses ADD KEY (person_id, amount, number_of_items_bought);

现在 EXPLAIN 报告不再显示低效访问。连接是通过索引(类型“ref”)完成的,临时表和文件排序消失了。“使用索引”表示它仅通过索引中的列访问连接的表,根本不需要触摸表行。

+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
| id | select_type | table | type  | possible_keys | key       | key_len | ref       | rows | Extra       |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
|  1 | SIMPLE      | p     | index | PRIMARY       | PRIMARY   | 4       | NULL      |    3 | NULL        |
|  1 | SIMPLE      | i     | ref   | person_id     | person_id | 5       | test.p.id |    1 | Using index |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+

+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
| id | select_type | table | type  | possible_keys | key       | key_len | ref       | rows | Extra       |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+
|  1 | SIMPLE      | p     | index | PRIMARY       | PRIMARY   | 4       | NULL      |    3 | NULL        |
|  1 | SIMPLE      | e     | ref   | person_id     | person_id | 5       | test.p.id |    1 | Using index |
+----+-------------+-------+-------+---------------+-----------+---------+-----------+------+-------------+

你说你想在一个查询中做到这一点,所以这里是如何做到的:

我们可以将这两个单独的查询合并为一个查询,以在每人一行中获得结果:

SELECT name, income_sum, work_hours_sum, expenses_sum, items_count
FROM
(SELECT p.id, p.name, SUM(i.amount) AS income_sum, SUM(number_of_hours_for_amount) AS work_hours_sum
 FROM people p
 LEFT OUTER JOIN income i ON p.id = i.person_id
 GROUP BY p.id) AS subq_i
INNER JOIN
(SELECT p.id, SUM(e.amount) AS expenses_sum, SUM(number_of_items_bought) AS items_count
 FROM people p
 LEFT OUTER JOIN expenses e ON p.id = e.person_id
 GROUP BY p.id) AS subq_e
USING (id);

+---------+------------+----------------+--------------+-------------+
| name    | income_sum | work_hours_sum | expenses_sum | items_count |
+---------+------------+----------------+--------------+-------------+
| Groucho |      20.00 |             20 |        30.00 |           4 |
| Harpo   |      40.00 |             40 |        30.00 |           4 |
| Chico   |      60.00 |             60 |        30.00 |           4 |
+---------+------------+----------------+--------------+-------------+

即使对于这个连接查询,EXPLAIN 看起来也不是那么糟糕。没有临时表或文件排序或连接缓冲区,并且很好地使用了覆盖索引。

+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
| id | select_type | table      | type  | possible_keys | key         | key_len | ref       | rows | Extra       |
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL        | NULL    | NULL      |    3 | NULL        |
|  1 | PRIMARY     | <derived3> | ref   | <auto_key0>   | <auto_key0> | 4       | subq_i.id |    2 | NULL        |
|  3 | DERIVED     | p          | index | PRIMARY       | PRIMARY     | 4       | NULL      |    3 | Using index |
|  3 | DERIVED     | e          | ref   | person_id     | person_id   | 5       | test.p.id |    1 | Using index |
|  2 | DERIVED     | p          | index | PRIMARY       | PRIMARY     | 4       | NULL      |    3 | NULL        |
|  2 | DERIVED     | i          | ref   | person_id     | person_id   | 5       | test.p.id |    1 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-----------+------+-------------+
于 2013-07-04T18:22:34.727 回答
0

也许您可以完全跳过 JOIN。

SELECT person_id
     , MIN(name) AS name
     , SUM(income_sum) AS income_sum
     , SUM(work_hours_sum) AS work_hours_sum
     , SUM(expenses_sum) AS expenses_sum
     , SUM(items_count) AS items_count
FROM (
SELECT id AS person_id
     , name
     , NULL AS income_sum
     , NULL AS work_hours_sum
     , NULL AS expenses_sum
     , NULL AS items_count
  FROM people
UNION ALL
SELECT person_id
     , NULL AS name
     , sum(amount) AS income_sum
     , sum(number_of_hours_for_amount) AS work_hours_sum
     , NULL AS expenses_sum
     , NULL AS items_count
  FROM income
 GROUP BY person_id
UNION ALL
SELECT person_id
     , NULL AS name
     , NULL AS income_sum
     , NULL AS work_hours_sum
     , sum(amount) AS expenses_sum
     , sum(number_of_items_bought) AS items_count
  FROM expenses
 GROUP BY person_id
) as d
WHERE person_id IS NOT NULL -- my sql generates this row
 GROUP BY person_id
于 2013-07-07T16:56:17.853 回答
0

这样的事情应该让你非常接近:

select id, name, (select sum(amount) from income i where i.person_id = p.id) as 'total_income_amount',
                 (select sum(number_of_hours_for_amount) from income i where i.person_id = p.id) as 'total_number_of_hours_for_amount',
                 (select sum(amount) from expenses e where e.person_id = p.id) as 'total_expenses_amount',
                 (select sum(number_of_items_bought) from expenses e where e.person_id = p.id) as 'total_number_of_items_bought'
from   people p;
于 2013-07-04T17:56:07.573 回答
0

尝试这个。两个连接都应该使用people.id.

SELECT name, income_sum, work_hours_sum, expenses_sum, items_count
FROM people

LEFT JOIN 
     (SELECT person_id, sum(amount) as income_sum, 
             sum(number_of_hours_for_amount) as work_hours_sum
      FROM income
      GROUP BY person_id) as income_subquery
ON people.id = income_subquery.person_id

LEFT JOIN
     (SELECT person_id, sum(amount) as expenses_sum, 
             sum(number_of_items_bought) as items_count
      FROM expenses
      GROUP BY person_id) as expenses_subquery
ON people.id = expenses_subquery.person_id

理想情况下,一个好的查询优化器会意识到您的原始 SQL 与此等价。但是您使用的是 MySQL,所以我不期望理想的优化。

确保您有索引income.person_idexpenses.person_id因此子查询中的分组将是有效的。

于 2013-07-04T17:38:00.177 回答