mysql - 无法优化使用 ORDER BY 子句的 MySQL 查询

Question

我将 Drupal 6 与 MySQL 版本 5.0.95 一起使用，并且陷入僵局，其中一个基于最近文章日期显示内容的查询速度变慢，并且由于使用频率完全降低了站点性能。有问题的查询如下：

     SELECT n.nid, 
            n.title, 
            ma.field_article_date_format_value, 
            ma.field_article_summary_value
       FROM node n 
 INNER JOIN content_type_article ma ON n.nid=ma.nid
 INNER JOIN term_node tn            ON n.nid=tn.nid 
      WHERE tn.tid= 153 
        AND n.status=1 
   ORDER BY ma.field_article_date_format_value DESC 
      LIMIT 0, 11;

查询的解释显示以下结果：

+----+-------------+-------+--------+--------------------------+---------+---------+----------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys            | key     | key_len | ref                  | rows  | Extra                           |
+----+-------------+-------+--------+--------------------------+---------+---------+----------------------+-------+---------------------------------+
|  1 | SIMPLE      | tn    | ref    | PRIMARY,nid              | PRIMARY | 4       | const                | 19006 | Using temporary; Using filesort |
|  1 | SIMPLE      | ma    | ref    | nid,ix_article_date      | nid     | 4       | drupal_mm_stg.tn.nid |     1 |                                 |
|  1 | SIMPLE      | n     | eq_ref | PRIMARY,node_status_type | PRIMARY | 4       | drupal_mm_stg.ma.nid |     1 | Using where                     |
+----+-------------+-------+--------+--------------------------+---------+---------+----------------------+-------+---------------------------------+

该查询看起来相对简单直接，检索属于类别（术语）153 且状态为 1（已发布）的文章。但显然使用临时表和使用文件排序意味着查询肯定会从我所了解的浏览中失败。

从 ORDER BY 子句中删除 field_article_date_format_value 解决了使用临时性问题；使用 filesort 减少了查询执行时间，但它是必需的，不能折衷，不幸的是，同样适用于站点性能。

我的直觉是，大部分问题来自 term_node 表，该表将文章映射到类别，并且是多对多关系表，这意味着如果文章 X 与 5 个类别 C1..C5 相关联，它将在该表中有 5 个条目，此表来自开箱即用的 drupal。

处理繁重的数据库内容对我来说是新事物，并且经历了一些类似的查询（按日期 desc 排序时，“使用临时”会减慢查询速度， MySQL 性能优化：按日期时间字段排序）我尝试为在 ORDER BY 子句中使用其日期时间字段的 content_type_article 以及其中的另一个键 (nid) 并尝试强制索引。

    SELECT n.nid, n.title,
           ma.field_article_date_format_value, 
           ma.field_article_summary_value 
      FROM node n 
INNER JOIN content_type_article ma FORCE INDEX (ix_article_date) ON n.nid=ma.nid 
INNER JOIN term_node tn ON n.nid=tn.nid 
     WHERE tn.tid= 153 
       AND n.status=1 
  ORDER BY ma.field_article_date_format_value DESC 
     LIMIT 0, 11;

结果和以下 EXPLAIN 查询似乎没有多大帮助

+----+-------------+-------+--------+--------------------------+-----------------+---------+----------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys            | key             | key_len | ref                  | rows  | Extra                           |
+----+-------------+-------+--------+--------------------------+-----------------+---------+----------------------+-------+---------------------------------+
|  1 | SIMPLE      | tn    | ref    | PRIMARY,nid              | PRIMARY         | 4       | const                | 18748 | Using temporary; Using filesort |
|  1 | SIMPLE      | ma    | ref    | ix_article_date          | ix_article_date | 4       | drupal_mm_stg.tn.nid |     1 |                                 |
|  1 | SIMPLE      | n     | eq_ref | PRIMARY,node_status_type | PRIMARY         | 4       | drupal_mm_stg.ma.nid |     1 | Using where                     |
+----+-------------+-------+--------+--------------------------+-----------------+---------+----------------------+-------+---------------------------------+

字段 n.nid、ca.nid、ma.field_article_date_format_value 均已编入索引。使用 ORDER BY 子句查询限制为 0,11 的数据库大约需要 7-10 秒，但没有它，查询几乎不需要一秒钟。数据库引擎是 MyISAM。对此的任何帮助将不胜感激。

任何可以帮助我像普通查询一样获得此查询的答案（与没有按日期排序的查询的速度相同）都会很棒。我尝试将复合查询创建为查询中的nidandfield_article_date_format_value和 use 的组合并没有帮助解决问题。我愿意提供有关该问题的更多信息和任何新建议。

score 6 · Accepted Answer

查看您的查询和解释，似乎在 where 子句中使用 n.status=1 会使搜索效率非常低，因为您需要返回连接定义的整个集合，然后应用 status = 1。尝试从由 WHERE 立即过滤的 term_node 表开始连接，然后使连接立即添加状态条件。试一试，请告诉我进展如何。

 SELECT n.nid, n.title,
           ma.field_article_date_format_value, 
           ma.field_article_summary_value 
      FROM term_node tn
INNER JOIN node n ON n.nid=tn.nid AND n.status=1
INNER JOIN content_type_article ma FORCE INDEX (ix_article_date) ON n.nid=ma.nid 
     WHERE tn.tid= 153 
  ORDER BY ma.field_article_date_format_value DESC 
     LIMIT 0, 11;

score 4 · Accepted Answer

1) 覆盖索引

我认为简单的答案可能是“覆盖索引”。

尤其是在content_type_article桌子上。“覆盖索引”将 ORDER BY 中的表达式作为前导列，并包括查询引用的所有列。这是我创建的索引（在我的测试表上）：

CREATE INDEX ct_article_ix9 
    ON content_type_article 
       (field_article_date_format_value, nid, field_article_summary_value);

这是我从查询中得到的 EXPLAIN 的摘录（在我使用 InnoDB 引擎构建示例表之后，包括每个表上的覆盖索引）：

_type  table type  key              ref          Extra                     
------ ----- ----- --------------   -----------  ------------------------
SIMPLE  ma   index ct_article_ix9   NULL         Using index
SIMPLE  n    ref   node_ix9         ma.nid       Using where; Using index
SIMPLE  tn   ref   term_node_ix9    n.nid,const  Using where; Using index

注意计划中没有'Using filesort'显示，计划'Using index'针对查询中引用的每个表显示，这基本上意味着查询所需的所有数据都是从索引页面中检索的，不需要从底层引用任何页面桌子。（你的表比我的测试表有更多的行，但是如果你能得到一个看起来像这样的解释计划，你可能会得到更好的性能。）

为了完整起见，这是整个 EXPLAIN 输出：

+----+-------------+-------+-------+---------------+----------------+---------+---------------------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key            | key_len | ref                 | rows | Extra                    |
+----+-------------+-------+-------+---------------+----------------+---------+-------- ------------+------+--------------------------+
|  1 | SIMPLE      | ma    | index | NULL          | ct_article_ix9 | 27      | NULL                |    1 | Using index              |
|  1 | SIMPLE      | n     | ref   | node_ix9      | node_ix9       | 10      | testps.ma.nid,const |   11 | Using where; Using index |
|  1 | SIMPLE      | tn    | ref   | term_node_ix9 | term_node_ix9  | 10      | testps.n.nid,const  |   11 | Using where; Using index |
+----+-------------+-------+-------+---------------+----------------+---------+---------------------+------+--------------------------+
3 rows in set (0.00 sec)

除了省略FORCE INDEX提示外，我没有对您的查询进行任何更改。这是我在查询中引用的其他两个表上创建的另外两个“覆盖索引”：

CREATE INDEX node_ix9
    ON node (`nid`,`status`,`title`);

CREATE INDEX term_node_ix9
    ON term_node (nid,tid);

（请注意，如果nid是表上的聚簇键，则node可能不需要节点表上的覆盖索引。）

2) 使用相关子查询代替连接？

如果前面的想法没有改进任何东西，那么作为另一种选择，由于原始查询最多返回 11 行，您可以尝试重写查询以避免连接操作，而是使用相关子查询。类似于下面的查询。

请注意，此查询与原始查询有很大不同。不同之处在于，使用此查询，context_type_article表中的一行将只返回一次。node通过使用连接的查询，该表中的一行可以与表中的多行匹配term_node，这将多次返回同一行。这可能被视为可取或不可取的，它实际上取决于基数以及结果集是否符合规范。

 SELECT ( SELECT n2.nid
            FROM node n2 
           WHERE n2.nid = ma.nid
             AND n2.status = 1
           LIMIT 1
        ) AS `nid`
      , ( SELECT n3.title 
            FROM node n3
           WHERE n3.nid = ma.nid
             AND n3.status = 1
           LIMIT 1
        ) AS `title`
      , ma.field_article_date_format_value
      , ma.field_article_summary_value
   FROM content_type_article ma
  WHERE EXISTS 
        ( SELECT 1
            FROM node n1
           WHERE n1.nid = ma.nid
             AND n1.status = 1
         )                 
     AND EXISTS
         ( SELECT 1
             FROM term_node tn
            WHERE tn.nid = ma.nid
             AND tn.tid = 153
         )
   ORDER BY ma.field_article_date_format_value DESC
   LIMIT 0,11

（有时，使用这种类型的“或相关子查询”的查询可能比执行连接操作的等效查询的性能要差得多。但在某些情况下，这样的查询实际上可以执行得更好，特别是考虑到非常有限的行数回来。）

这是该查询的解释输出：

+----+--------------------+-------+-------+---------------+----------------+---------+---------------------+------+--------------------------+
| id | select_type        | table | type  | possible_keys | key            | key_len | ref                 | rows | Extra                    |
+----+--------------------+-------+-------+---------------+----------------+---------+---------------------+------+--------------------------+
|  1 | PRIMARY            | ma    | index | NULL          | ct_article_ix9 | 27      | NULL                |   11 | Using where; Using index |
|  5 | DEPENDENT SUBQUERY | tn    | ref   | term_node_ix9 | term_node_ix9  | 10      | testps.ma.nid,const |   13 | Using where; Using index |
|  4 | DEPENDENT SUBQUERY | n1    | ref   | node_ix9      | node_ix9       | 10      | testps.ma.nid,const |   12 | Using where; Using index |
|  3 | DEPENDENT SUBQUERY | n3    | ref   | node_ix9      | node_ix9       | 10      | testps.ma.nid,const |   12 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | n2    | ref   | node_ix9      | node_ix9       | 10      | testps.ma.nid,const |   12 | Using where; Using index |
+----+--------------------+-------+-------+---------------+----------------+---------+---------------------+------+--------------------------+
5 rows in set (0.00 sec)

再次注意，每次访问都是'Using index'，这意味着直接从索引页满足查询，而不必访问基础表中的任何数据页。

示例表

以下是我根据您的问题中的信息构建和填充的示例表（以及索引）：

CREATE TABLE `node` (`id` INT PRIMARY KEY, `nid` INT, `title` VARCHAR(10),`status` INT);
CREATE INDEX node_ix9 ON node (`nid`,`status`,`title`);
INSERT INTO `node` VALUES (1,1,'foo',1),(2,2,'bar',0),(3,3,'fee',1),(4,4,'fi',0),(5,5,'fo',1),(6,6,'fum',0),(7,7,'derp',1);
INSERT INTO `node` SELECT id+7,nid+7,title,`status` FROM node;
INSERT INTO `node` SELECT id+14,nid+14,title,`status` FROM node;
INSERT INTO `node` SELECT id+28,nid+28,title,`status` FROM node;
INSERT INTO `node` SELECT id+56,nid+56,title,`status` FROM node;

CREATE TABLE content_type_article (id INT PRIMARY KEY, nid INT, field_article_date_format_value DATETIME, field_article_summary_value VARCHAR(10));
CREATE INDEX ct_article_ix9 ON content_type_article (field_article_date_format_value, nid, field_article_summary_value);
INSERT INTO content_type_article VALUES (1001,1,'2012-01-01','foo'),(1002,2,'2012-01-02','bar'),(1003,3,'2012-01-03','fee'),(1004,4,'2012-01-04','fi'),(1005,5,'2012-01-05','fo'),(1006,6,'2012-01-06','fum'),(1007,7,'2012-01-07','derp');
INSERT INTO content_type_article SELECT id+7,nid+7, DATE_ADD(field_article_date_format_value,INTERVAL 7 DAY),field_article_summary_value FROM content_type_article;
INSERT INTO content_type_article SELECT id+14,nid+14, DATE_ADD(field_article_date_format_value,INTERVAL 14 DAY),field_article_summary_value FROM content_type_article;
INSERT INTO content_type_article SELECT id+28,nid+28, DATE_ADD(field_article_date_format_value,INTERVAL 28 DAY),field_article_summary_value FROM content_type_article;
INSERT INTO content_type_article SELECT id+56,nid+56, DATE_ADD(field_article_date_format_value,INTERVAL 56 DAY),field_article_summary_value FROM content_type_article;

CREATE TABLE term_node (id INT, tid INT, nid INT);
CREATE INDEX term_node_ix9 ON term_node (nid,tid);
INSERT INTO term_node VALUES (2001,153,1),(2002,153,2),(2003,153,3),(2004,153,4),(2005,153,5),(2006,153,6),(2007,153,7);
INSERT INTO term_node SELECT id+7, tid, nid+7 FROM term_node;
INSERT INTO term_node SELECT id+14, tid, nid+14 FROM term_node;
INSERT INTO term_node SELECT id+28, tid, nid+28 FROM term_node;
INSERT INTO term_node SELECT id+56, tid, nid+56 FROM term_node;

score 4 · Accepted Answer

Using temporary; Using filesort仅表示MySQL需要构造一个临时结果表并对其进行排序以获得您需要的结果。这通常是ORDER BY ... DESC LIMIT 0,n您用于获取最新帖子的构造的结果。它本身并不是失败的标志。看到这个： http ://www.mysqlperformanceblog.com/2009/03/05/what-does-using-filesort-mean-in-mysql/

这里有一些可以尝试的东西。我不完全确定它们会起作用；如果没有您的数据进行试验，很难知道。

是否有 BTREE 索引content_type_article.field_article_date_format_value？如果是这样，那可能会有所帮助。

你必须显示最近的 11 篇文章吗？或者您能否显示最近一周或一个月内出现的 11 篇最新文章？如果是这样，您可以将此行添加到您的WHERE子句中。它会按日期过滤您的内容，而不必一直追溯到匹配文章的开始时间。如果您有一个历史悠久的 Drupal 站点，这将特别有用。

   AND ma.field_article_date_format_value >= (CURRENT_TIME() - INTERVAL 1 MONTH)

首先，尝试翻转 INNER JOIN 操作的顺序。其次，将 tid=153 合并到连接标准中。这可能会减少您需要排序的临时表的大小。我的建议如下：

    SELECT n.nid, 
           n.title, 
           ma.field_article_date_format_value, 
           ma.field_article_summary_value
      FROM node n 
INNER JOIN term_node tn            ON (n.nid=tn.nid AND tn.tid = 153) 
INNER JOIN content_type_article ma ON n.nid=ma.nid
     WHERE n.status=1 
       AND ma.field_article_date_format_value >= (CURRENT_TIME() - INTERVAL 1 MONTH)
  ORDER BY ma.field_article_date_format_value DESC 
     LIMIT 0, 11;

那些是

score 2 · Accepted Answer

MySQL 正在“优化”您的查询，以便它首先从 term_node 表中选择，即使您指定先从节点中选择。不知道数据，我不确定哪个是最佳方式。term_node 表肯定是您的性能问题所在，因为从那里选择了大约 19,000 条记录。

没有 ORDER BY 的限制几乎总是更快，因为 MySQL 一旦找到指定的限制就会停止。使用 ORDER BY，它首先必须找到所有记录并对其进行排序，然后获得指定的限制。

尝试的简单方法是将 WHERE 条件移到 JOIN 子句中，这是它应该在的位置。该过滤器特定于要连接的表。这将确保 MySQL 不会错误地优化它。

INNER JOIN term_node tn ON n.nid=tn.nid AND tn.tid=153

更复杂的事情是在 term_node 表上执行 SELECT 并在其上进行 JOIN。这称为派生表，您将在 EXPLAIN 中看到它是这样定义的。既然你说它是多对多的，我添加了一个 DISTINCT 参数来减少要加入的记录数。

SELECT ...
FROM node n
INNER JOIN content_type_article ma FORCE INDEX (ix_article_date) ON n.nid=ma.nid
INNER JOIN (SELECT DISTINCT nid FROM term_node WHERE tid=153) tn ON n.nid=tn.nid
WHERE n.status=1
ORDER BY ma.field_article_date_format_value DESC 
LIMIT 0,11

MySQL 5.0 对派生表有一些限制，所以这可能行不通。虽然有变通办法。

score 1 · Accepted Answer

如果可以通过利用预排序索引，您真的想完全避免发生排序操作。

要确定这是否可能，请想象将您的数据非规范化到一个表中，并确保必须包含在 WHERE 子句中的所有内容都可以使用 SINGLE VALUE 指定。例如，如果您必须在其中一列上使用 IN 子句，那么排序是不可避免的。

这是一些示例数据的屏幕截图：

样本数据非规范化并按 tid、状态 DESC、日期 DESC 排序

因此，如果您确实对数据进行了非规范化，则可以使用单个值查询 tid 和 status，然后按日期降序排序。这意味着在这种情况下，以下索引将完美运行：

create index ix1 on denormalisedtable(tid, status, date desc);

如果你有这个，你的查询只会命中前 10 行，并且永远不需要排序。

那么-如何在不进行非规范化的情况下获得相同的性能...

我认为你应该能够使用STRAIGHT_JOIN子句来强制 MySQL 从表中选择的顺序 - 你想让它从你最后排序的表中选择。

尝试这个：

SELECT n.nid, 
        n.title, 
        ma.field_article_date_format_value, 
        ma.field_article_summary_value
FROM node n 
STRAIGHT_JOIN term_node tn            ON n.nid=tn.nid 
STRAIGHT_JOIN content_type_article ma ON n.nid=ma.nid
WHERE tn.tid= 153 
    AND n.status=1 
ORDER BY ma.field_article_date_format_value DESC 
LIMIT 0, 11;

这个想法是让 MySQL 从节点表中选择，然后从 term_node 表中选择，然后最终从 content_type_article 表（包含您正在排序的列的表）中选择。

最后一个连接是您最重要的连接，您希望它使用索引来实现，这样 LIMIT 子句就可以工作而无需对数据进行排序。

这个单一索引可能会起作用：

create index ix1 on content_type_article(nid, field_article_date_format_value desc);

或者

create index ix1 on content_type_article(nid, field_article_date_format_value desc, field_article_summary_value);

（对于覆盖指数）

我说可能，因为我对 MySQL 优化器的了解不够多，无法知道它是否足够聪明地处理多个“nid”列值，这些值将被输入到 content_type_article 中而无需使用数据。

从逻辑上讲，它应该能够快速工作 - 例如，如果将 5 个 nid 值输入到最终的 content_type_article 表中，那么它应该能够直接从索引中获取每个值的前 10 个并将结果合并在一起，然后选择最终的顶部10，这意味着从该表中读取的总共 50 行插入了您当前看到的完整 19006。

让我知道事情的后续。

如果它适合您，则可以使用覆盖其他表上的索引来加速前两个连接的进一步优化。

mysql - 无法优化使用 ORDER BY 子句的 MySQL 查询

5 回答 5

1) 覆盖索引

2) 使用相关子查询代替连接？

示例表

Related

Reference