php - 使用 GROUP BY ... HAVING 优化 MySQL 查询时遇到问题

Question

我正在尝试快速优化一些用 PHP 编写的过时论坛软件的搜索功能。我的工作归结为一个如下所示的查询：

SELECT thread.threadid
FROM thread AS thread
INNER JOIN word AS word ON (word.title LIKE 'word1' OR word.title LIKE 'word2')
INNER JOIN postindex AS postindex ON (postindex.wordid = word.wordid)
INNER JOIN post AS postquery ON (postquery.postid = postindex.postid)
WHERE thread.threadid = postquery.threadid
GROUP BY thread.threadid
HAVING COUNT(DISTINCT word.wordid) = 2
LIMIT 25;

word1并且word2是示例；可以有任意数量的单词。查询末尾的数字是单词的总数。这个想法是一个线程最包含搜索查询中的所有单词，分布在任意数量的帖子中。

这个查询经常超过 60 秒，只有两个词，并且超时。我很难过；我不知道如何进一步优化这个可怕的搜索引擎。

据我所知，所有内容都已正确编入索引，并且我最近运行ANALYZE过。大多数数据库都在 InnoDB 上运行。这是输出EXPLAIN：

+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
| id | select_type | table     | type   | possible_keys                                                                          | key     | key_len | ref                          | rows | Extra                                                     |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | word      | range  | PRIMARY,title                                                                          | title   | 150     | NULL                         |    2 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | postindex | ref    | wordid,temp_ix                                                                         | temp_ix | 4       | database1.word.wordid        |    3 | Using index condition                                     |
|  1 | SIMPLE      | postquery | eq_ref | PRIMARY,threadid,showthread                                                            | PRIMARY | 4       | database1.postindex.postid   |    1 | NULL                                                      |
|  1 | SIMPLE      | thread    | eq_ref | PRIMARY,forumid,postuserid,pollid,title,lastpost,dateline,prefixid,tweeted,firstpostid | PRIMARY | 4       | database1.postquery.threadid |    1 | Using index                                               |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+

更新

LIMIT 25似乎没有多大帮助。它可能会从通常返回数百个结果的查询中减少第二次。

澄清

使 MySQL 变慢的部分是GROUP BY ... HAVING ...一点。使用GROUP BY，LIMIT对于提高性能几乎没有用处。没有GROUP BY，只要有LIMIT剩余，查询就相当迅速。

SQL 信息

输出SHOW CREATE TABLE postindex;：

CREATE TABLE `postindex` (
  `wordid` int(10) unsigned NOT NULL DEFAULT '0',
  `postid` int(10) unsigned NOT NULL DEFAULT '0',
  `intitle` smallint(5) unsigned NOT NULL DEFAULT '0',
  `score` smallint(5) unsigned NOT NULL DEFAULT '0',
  UNIQUE KEY `wordid` (`wordid`,`postid`),
  KEY `temp_ix` (`wordid`),
  KEY `postid` (`postid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

我没有制作表格，所以我不知道为什么 wordid 上有重复的索引；但是，我不愿意删除它，因为这是一个古老的、变化无常的软件。

score 1 · Accepted Answer

您可以尝试多次重写并比较执行计划和时间。

使用 2EXISTS个子查询（每个要检查的单词一个）：

SELECT t.threadid
FROM thread AS t
WHERE EXISTS
      ( SELECT 1
        FROM post AS p
          JOIN postindex AS pi
            ON pi.postid = p.postid
          JOIN word AS w
            ON pi.wordid = w.wordid
        WHERE w.title = 'word1'
          AND t.threadid = p.threadid
      )
  AND EXISTS
      ( SELECT 1
        FROM post AS p
          JOIN postindex AS pi
            ON pi.postid = p.postid
          JOIN word AS w
            ON pi.wordid = w.wordid
        WHERE w.title = 'word2'
          AND t.threadid = p.threadid
      ) ;

使用一个EXISTS子查询：

SELECT t.threadid
FROM thread AS t
WHERE EXISTS
      ( SELECT 1
        FROM post AS p1
          JOIN postindex AS pi1
            ON  pi1.postid = p1.postid
          JOIN word AS w1
            ON  w1.wordid = pi1.wordid
            AND w1.title = 'word1'

          JOIN post AS p2
            ON  p2.threadid = p1.threadid
          JOIN postindex AS pi2
            ON  pi2.postid = p2.postid
          JOIN word AS w2
            ON  w2.wordid = pi2.wordid
            AND w2.title = 'word2'

        WHERE t.threadid = p1.threadid
          AND t.threadid = p2.threadid
      ) ;

具有多个连接的单个查询，GROUP BY仅用于删除重复项threadid：

SELECT t.threadid
FROM thread AS t

  JOIN post AS p1
    ON  p1.threadid = t.threadid
  JOIN postindex AS pi1
    ON  pi1.postid = p1.postid
  JOIN word AS w1
    ON  w1.wordid = pi1.wordid
    AND w1.title = 'word1'

  JOIN post AS p2
    ON  p1.threadid = t.threadid
  JOIN postindex AS pi2
    ON  pi2.postid = p2.postid
  JOIN word AS w2
    ON  w2.wordid = pi2.wordid
    AND w2.title = 'word2'

WHERE p1.threadid = p2.threadid        -- this line is redundant
GROUP BY t.threadid ;

score 0 · Accepted Answer

我首先创建临时表，并存储与您的搜索匹配的不同（thread.threadid、word.wordid）。然后选择 thread.threadid 其中 count() = 搜索词数。

php - 使用 GROUP BY ... HAVING 优化 MySQL 查询时遇到问题

更新

澄清

SQL 信息

2 回答 2

Related

Reference