我已经阅读了很多这方面的内容,但每个查询仍然需要 30 多秒,而我确信它应该工作得更快。
问题是这样的:
有一个大型链接表(4000 万行,由 650MB 组成的数据和代表 1.8 GB 的索引)定义如下:
CREATE TABLE IF NOT EXISTS `glossary_entry_wordList_1` (
`idTerm` mediumint(8) unsigned NOT NULL,
`idKeyword` mediumint(8) unsigned NOT NULL,
`termLength` smallint(6) NOT NULL,
`termNumberWords` tinyint(4) NOT NULL,
`termTransliteralRFC` mediumint(9) NOT NULL,
`keywordLength` tinyint(3) unsigned NOT NULL,
`termLanguage` tinyint(4) NOT NULL,
PRIMARY KEY (`idKeyword`,`idTerm`),
KEY `termTransliteralRFC` (`termTransliteralRFC`),
KEY `termLength` (`termLength`),
KEY `secondPrimary` (`idTerm`,`idKeyword`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
和一个小的临时表定义如下:
CREATE TEMPORARY TABLE IF NOT EXISTS `foundIDs` (
`searchId` int(11) NOT NULL,
`searchedKeywordId` int(11) NOT NULL,
`similarKeywordId` mediumint(8) unsigned NOT NULL,
`partsMatched` tinyint(4) NOT NULL,
`sumSimliarParts` int(11) NOT NULL,
`keywordLength` int(11) NOT NULL,
`fuzzyMark` float NOT NULL,
`keywordDjb2` bigint(20) NOT NULL,
`smallKeyword` tinyint(4) NOT NULL,
PRIMARY KEY (`similarKeywordId`),
KEY `searchId` (`searchId`),
KEY `searchedKeywordId` (`searchedKeywordId`),
KEY `partsMatched` (`partsMatched`),
KEY `keywordLength` (`keywordLength`),
KEY `smallKeyword` (`smallKeyword`),
KEY `keywordDjb2` (`keywordDjb2`)
) ENGINE=MEMORY DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
我需要从与表中至少 50%(或任何其他百分比)相关联的glossary_entry_wordList_1
所有内容中检索。idTerm
idKeyword
foundIDs
实际上,我需要找到所有包含 x 个单词的句子。
为此,我使用这样的查询(请注意,这里的条件数据仅作为示例):
SELECT glossary_entry_wordList_1.idTerm,
count( glossary_entry_wordList_1.idKeyword) as termsMatched ,
sum(foundIDs.keywordLength) as sumTermLength,
sum(foundIDs.fuzzyMark)/100 as sumFuzzy,
sum(foundIDs.partsMatched*foundIDs.keywordLength)/100 as sumLengthSimilar,
foundIDs.searchId
FROM foundIDs
inner join glossary_entry_wordList_1 on glossary_entry_wordList_1.idKeyword = foundIDs.similarKeywordId
WHERE
foundIDs.searchId = '17559' and
glossary_entry_wordList_1.termTransliteralRFC >= '824.4' and
glossary_entry_wordList_1.termLength>= '8.55' and
glossary_entry_wordList_1.termLength<= '18' and
foundIDs.smallKeyword = 0
GROUP BY glossary_entry_wordList_1.idTerm
HAVING count( glossary_entry_wordList_1.idKeyword)>'2'
order by null
这是解释:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE foundIDs ref PRIMARY,searchId,smallKeyword searchId 4 const 8 Using where; Using temporary
1 SIMPLE glossary_entry_wordList_1 ref PRIMARY,termTransliteralRFC,termLength PRIMARY 3 foundIDs.similarKeywordId 146 Using where
引擎行为是这样的: - 字长越小(1-2 个字母),查询响应越长(显然因为它们有更多的关联) - 搜索表 (foundIds) 中的字越多,查询越长
关于如何改进查询响应的任何想法?
谢谢,