3

我有一张像这样的大桌子

CREATE TABLE IF NOT EXISTS `object_search` (
  `keyword` varchar(40) COLLATE latin1_german1_ci NOT NULL,
  `object_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`keyword`,`media_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;

大约 3900 万行(使用超过 1 GB 的空间)包含对象表中 100 万条记录的索引数据(其中object_id指向)。

现在用这样的查询搜索这个

SELECT object_id, COUNT(object_id) AS hits
FROM object_search
WHERE keyword = 'woman' OR keyword = 'house'
GROUP BY object_id
HAVING hits = 2

已经比在表中的组合字段LIKE上搜索要快得多,但仍需要长达 1 分钟。keywordsobject

它的解释看起来像:

+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
| id | select_type | table  | type | possible_keys | key     | key_len | ref   | rows   | filtered | Extra                    |
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
|  1 | SIMPLE      | search | ref  | PRIMARY       | PRIMARY | 42      | const | 345180 |   100.00 | Using where; Using index |
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+

完整的解释与加入objectobject_colorobject_locale而上述查询在子查询中运行以避免开销,如下所示:

+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
| id | select_type | table             | type   | possible_keys | key       | key_len | ref              | rows   | filtered | Extra                           |
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
|  1 | PRIMARY     | <derived2>        | ALL    | NULL          | NULL      | NULL    | NULL             | 182544 |   100.00 | Using temporary; Using filesort |
|  1 | PRIMARY     | object_color      | eq_ref | object_id     | object_id | 4       | search.object_id |      1 |   100.00 |                                 |
|  1 | PRIMARY     | locale            | eq_ref | object_id     | object_id | 4       | search.object_id |      1 |   100.00 |                                 |
|  1 | PRIMARY     | object            | eq_ref | PRIMARY       | PRIMARY   | 4       | search.object_id |      1 |   100.00 |                                 |
|  2 | DERIVED     | search            | ref    | PRIMARY       | PRIMARY   | 42      |                  | 345180 |   100.00 | Using where; Using index        |
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+

我的首要目标是能够在 1 或 2 秒内完成扫描。

那么,是否有进一步的技术来提高关键字的搜索速度?


2013 年 8 月 6 日更新:

应用Neville K的大部分建议,我现在有以下设置:

CREATE TABLE `object_search_keyword` (
  `keyword_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `keyword` varchar(64) COLLATE latin1_german1_ci NOT NULL,
  PRIMARY KEY (`keyword_id`),
  FULLTEXT KEY `keyword_ft` (`keyword`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;

CREATE TABLE `object_search` (
  `keyword_id` int(10) unsigned NOT NULL,
  `object_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`keyword_id`,`media_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

新查询的解释如下所示:

+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
| id | select_type | table          | type     | possible_keys      | key        | key_len | ref                       | rows    | filtered | Extra                                        |
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
|  1 | PRIMARY     | <derived2>     | ALL      | NULL               | NULL       | NULL    | NULL                      |   24381 |   100.00 | Using temporary; Using filesort              |
|  1 | PRIMARY     | object_color   | eq_ref   | object_id          | object_id  | 4       | object_search.object_id   |       1 |   100.00 |                                              |
|  1 | PRIMARY     | object         | eq_ref   | PRIMARY            | PRIMARY    | 4       | object_search.object_id   |       1 |   100.00 |                                              |
|  1 | PRIMARY     | locale         | eq_ref   | object_id          | object_id  | 4       | object_search.object_id   |       1 |   100.00 |                                              |
|  2 | DERIVED     | <derived4>     | system   | NULL               | NULL       | NULL    | NULL                      |       1 |   100.00 |                                              |
|  2 | DERIVED     | <derived3>     | ALL      | NULL               | NULL       | NULL    | NULL                      |   24381 |   100.00 |                                              |
|  4 | DERIVED     | NULL           | NULL     | NULL               | NULL       | NULL    | NULL                      |    NULL |     NULL | No tables used                               |
|  3 | DERIVED     | object_keyword | fulltext | PRIMARY,keyword_ft | keyword_ft | 0       |                           |       1 |   100.00 | Using where; Using temporary; Using filesort |
|  3 | DERIVED     | object_search  | ref      | PRIMARY            | PRIMARY    | 4       | object_keyword.keyword_id | 2190225 |   100.00 | Using index                                  |
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+

许多派生来自关键字比较子查询被嵌套到另一个子查询中,它只计算返回的行数:

SELECT SQL_NO_CACHE object.object_id, ..., @rn AS numrows
FROM (
    SELECT *, @rn := @rn + 1
    FROM (
        SELECT SQL_NO_CACHE search.object_id, COUNT(turbo.object_id) AS hits
        FROM object_keyword AS kwd
        INNER JOIN object_search AS search ON (kwd.keyword_id = search.keyword_id)
        WHERE MATCH (kwd.keyword) AGAINST ('+(woman) +(house)')
        GROUP BY search.object_id HAVING hits = 2
    ) AS numrowswrapper
    CROSS JOIN (SELECT @rn := 0) CONST
) AS turbo
INNER JOIN object AS object ON (search.object_id = object.object_id)
LEFT JOIN object_color AS object_color ON (search.object_id = object_color.object_id)
LEFT JOIN object_locale AS locale ON (search.object_id = locale.object_id)
ORDER BY timestamp_upload DESC

上面的查询实际上会在大约 6 秒内运行,因为它搜索两个关键字。我搜索的关键字越多,搜索下降的速度就越快。

有什么方法可以进一步优化吗?


2013-08-07 更新

阻塞的东西似乎几乎可以肯定是附加的ORDER BY语句。没有它,查询将在不到一秒的时间内执行。

那么,有什么方法可以更快地对结果进行排序?欢迎任何建议,即使是需要在其他地方进行后期处理的骇人听闻的建议。


当天晚些时候更新 2013-08-07

好的女士们先生们,将WHEREandORDER BY语句嵌套在另一层子查询中,以免它打扰不需要的表,它的性能再次大致翻倍:

SELECT wowrapper.*, locale.title
FROM (
    SELECT SQL_NO_CACHE object.object_id, ..., @rn AS numrows
    FROM (
        SELECT *, @rn := @rn + 1
        FROM (
            SELECT SQL_NO_CACHE search.media_id, COUNT(search.media_id) AS hits
            FROM object_keyword AS kwd
            INNER JOIN object_search AS search ON (kwd.keyword_id = search.keyword_id)
            WHERE MATCH (kwd.keyword) AGAINST ('+(frau)')
            GROUP BY search.media_id HAVING hits = 1
        ) AS numrowswrapper
        CROSS JOIN (SELECT @rn := 0) CONST
    ) AS search 
    INNER JOIN object AS object ON (search.object_id = object.object_id) 
    LEFT JOIN object_color AS color ON (search.object_id = color.object_id)
    WHERE 1
    ORDER BY object.object_id DESC
) AS wowrapper 
LEFT JOIN object_locale AS locale ON (jfwrapper.object_id = locale.object_id) 
LIMIT 0,48

需要 12 秒的搜索(单个关键字,约 200K 结果)现在需要 6 秒,搜索需要 6 秒(60K 结果)的两个关键字现在需要大约 3.5 秒。

现在这已经是一个巨大的进步,但有没有机会进一步推动呢?


当天早些时候更新2013-08-08

取消查询的最后一个嵌套变体,因为它实际上减慢了它的其他变体......我现在正在尝试FULLTEXT使用 MyISAM 使用不同的表布局和索引的其他一些东西,用于具有组合关键字字段的专用搜索表(逗号分隔在一个TEXT领域)。


2013-08-08 更新

好吧,纯全文索引并没有真正的帮助。

回到以前的设置,唯一阻塞的是ORDER BY(使用临时表和文件排序)。没有它,搜索将在不到一秒的时间内完成!

所以基本上剩下的就是:
如何优化ORDER BY语句以更快地运行,可能是通过消除临时表的使用?

4

3 回答 3

1

全文搜索将比使用标准 SQL 字符串比较功能快得多。

其次,如果关键字的冗余度很高,可以考虑“多对多”实现:

Keywords
--------
keyword_id
keyword

keyword_object
-------------
keyword_id
object_id

objects
-------
object_id
......

如果这将字符串比较从 3900 万行减少到 100K 行(大约是英语词典的大小),您可能还会看到明显的改进,因为查询只需执行 100K 字符串比较,并加入整数关键字_id 和object_id 字段应该比进行 39M 字符串比较快得多。

于 2013-07-26T12:30:39.023 回答
0

最好的解决方案是 FULLTEXT 搜索,但您可能需要一个 MyISAM 表。您可以设置一个镜像表并使用一些事件和触发器对其进行更新,或者如果您有一个从服务器复制的从属表,您可以将其表更改为 MyISAM 并将其用于搜索。

对于这个查询,我唯一能想到的就是将其重写为:

SELECT s1.object_id
FROM object_search s1
JOIN object_search s2 ON s2.object_id = s1.object_id AND s2.key_word = 'word2'
JOIN object_search s3 ON s3.object_id = s1.object_id AND s3.key_word = 'word3'
....
WHERE s1.key_word = 'word1'

而且我不确定这样会更快。

此外,您还需要在 object_id 上有一个索引(假设您的 PK 是(key_word, object_id))。

于 2013-07-26T11:00:48.147 回答
0

如果您很少插入并且经常选择选择,则可以优化读取数据,即重新计算每个关键字的 object_id 数量并将其直接存储在数据库中。然后 SELECT 会非常快,但 INSERT 需要几秒钟。

于 2013-07-26T11:04:22.617 回答