mysql - MySQL分组查询优化

Question

我有三个表：categories、articles和article_events，结构如下

categories: id, name                        (100,000 rows)
articles: id, category_id                   (6000 rows)
article_events: id, article_id, status_id   (20,000 rows)

每个文章行的最高 article_events.id 描述了每篇文章的当前状态。

我正在返回一个类别表以及其中有多少篇最新事件 status_id 为“1”的文章。

到目前为止我所做的工作，但对于我的桌子大小来说相当慢（10 秒）。想知道是否有办法让这更快。据我所知，所有表都有适当的索引。

SELECT c.id, 
       c.name, 
       SUM(CASE WHEN e.status_id = 1 THEN 1 ELSE 0 END) article_count
FROM categories c
LEFT JOIN articles a ON a.category_id = c.id
LEFT JOIN (
    SELECT article_id, MAX(id) event_id
    FROM article_events
    GROUP BY article_id
) most_recent ON most_recent.article_id = a.id
LEFT JOIN article_events e ON most_recent.event_id = e.id
GROUP BY c.id

基本上我必须加入事件表两次，因为询问 status_id 和 MAX(id) 只会返回它找到的第一个 status_id，而不是与 MAX(id) 行关联的那个。

有什么办法可以让这变得更好？还是我只需要忍受 10 秒？谢谢！

编辑：

这是我对查询的解释：

ID | select_type | table          | type   | possible_keys | key         | key_len | ref                  | rows   | Extra 
---------------------------------------------------------------------------------------------------------------------------
1  | PRIMARY     | c              | index  | NULL          | PRIMARY     | 4       | NULL                 | 124044 | Using index; Using temporary; Using filesort
1  | PRIMARY     | a              | ref    | category_id   | category_id | 4       | c.id                 | 3      |
1  | PRIMARY     | <derived2>     | ALL    | NULL          | NULL        | NULL    | NULL                 | 6351   |
1  | PRIMARY     | e              | eq_ref | PRIMARY       | PRIMARY     | 4       | most_recent.event_id | 1      |
2  | DERIVED     | article_events | ALL    | NULL          | NULL        | NULL    | NULL                 | 19743  | Using temporary; Using filesort

score 1 · Accepted Answer

如果您可以使用 JOIN 消除子查询，它通常会执行得更好，因为派生表不能使用索引。这是您没有子查询的查询：

SELECT c.id, 
       c.name, 
       COUNT(a1.article_id) AS article_count
FROM categories c
LEFT JOIN articles a ON a.category_id = c.id
LEFT JOIN article_events ae1
  ON ae1.article_id = a.id
LEFT JOIN article_events ae2
  ON ae2.article_id = a.id
  AND ae2.id > a1.id
WHERE ae2.id IS NULL
GROUP BY c.id

您将要试验索引并使用 EXPLAIN 进行测试，但这是我的猜测（我假设id字段是主键并且您使用的是 InnoDB）：

categories: `name`
articles: `category_id`
article_events: (`article_id`, `id`)

score 0 · Accepted Answer

没有尝试过，但我认为这将为数据库节省一些工作：

SELECT ae.article_id AS ref_article_id, 
    MAX(ae.id) event_id, 
    ae.status_id,
    (select a.category_id from articles a where a.id = ref_article_id) AS cat_id,
    (select c.name from categories c where c.id = cat_id) AS cat_name
FROM article_events
GROUP BY ae.article_id

希望有帮助

编辑：

顺便说一句...请记住，连接必须遍历每一行，因此如果可以提供帮助，您应该从小端开始选择并逐步向上。在这种情况下，查询必须遍历 100,000 条记录，并连接每条记录，然后一次又一次地连接这 100,000 条记录，即使值为空，它仍然必须遍历这些记录。

希望这一切都有帮助...

score 0 · Accepted Answer

我不喜欢categories.id使用该索引，因为您要选择整个表。

尝试运行：

ANALYZE TABLE categories;
ANALYZE TABLE article_events;

并重新运行查询。

mysql - MySQL分组查询优化

3 回答 3

Related

Reference