0

我已经阅读了很多这方面的内容,但每个查询仍然需要 30 多秒,而我确信它应该工作得更快。

问题是这样的:

有一个大型链接表(4000 万行,由 650MB 组成的数据和代表 1.8 GB 的索引)定义如下:

CREATE TABLE IF NOT EXISTS `glossary_entry_wordList_1` (
  `idTerm` mediumint(8) unsigned NOT NULL,
  `idKeyword` mediumint(8) unsigned NOT NULL,
  `termLength` smallint(6) NOT NULL,
  `termNumberWords` tinyint(4) NOT NULL,
  `termTransliteralRFC` mediumint(9) NOT NULL,
  `keywordLength` tinyint(3) unsigned NOT NULL,
  `termLanguage` tinyint(4) NOT NULL,
  PRIMARY KEY (`idKeyword`,`idTerm`),
  KEY `termTransliteralRFC` (`termTransliteralRFC`),
  KEY `termLength` (`termLength`),
  KEY `secondPrimary` (`idTerm`,`idKeyword`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci

和一个小的临时表定义如下:

CREATE TEMPORARY TABLE IF NOT EXISTS `foundIDs` (
  `searchId` int(11) NOT NULL,
  `searchedKeywordId` int(11) NOT NULL,
  `similarKeywordId` mediumint(8) unsigned NOT NULL,
  `partsMatched` tinyint(4) NOT NULL,
  `sumSimliarParts` int(11) NOT NULL,
  `keywordLength` int(11) NOT NULL,
  `fuzzyMark` float NOT NULL,
  `keywordDjb2` bigint(20) NOT NULL,
  `smallKeyword` tinyint(4) NOT NULL,
  PRIMARY KEY (`similarKeywordId`),
  KEY `searchId` (`searchId`),
  KEY `searchedKeywordId` (`searchedKeywordId`),
  KEY `partsMatched` (`partsMatched`),
  KEY `keywordLength` (`keywordLength`),
  KEY `smallKeyword` (`smallKeyword`),
  KEY `keywordDjb2` (`keywordDjb2`)
) ENGINE=MEMORY DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

我需要从与表中至少 50%(或任何其他百分比)相关联的glossary_entry_wordList_1所有内容中检索。idTermidKeywordfoundIDs

实际上,我需要找到所有包含 x 个单词的句子。

为此,我使用这样的查询(请注意,这里的条件数据仅作为示例):

SELECT glossary_entry_wordList_1.idTerm, 
count( glossary_entry_wordList_1.idKeyword) as termsMatched , 
sum(foundIDs.keywordLength) as sumTermLength, 
sum(foundIDs.fuzzyMark)/100 as sumFuzzy, 
sum(foundIDs.partsMatched*foundIDs.keywordLength)/100 as sumLengthSimilar, 
foundIDs.searchId
FROM foundIDs
inner join glossary_entry_wordList_1 on glossary_entry_wordList_1.idKeyword = foundIDs.similarKeywordId
WHERE
foundIDs.searchId = '17559' and
glossary_entry_wordList_1.termTransliteralRFC >= '824.4' and
glossary_entry_wordList_1.termLength>= '8.55' and
glossary_entry_wordList_1.termLength<= '18' and
foundIDs.smallKeyword = 0
GROUP BY glossary_entry_wordList_1.idTerm
HAVING count( glossary_entry_wordList_1.idKeyword)>'2'
order by null

这是解释:

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra  
1 SIMPLE foundIDs ref PRIMARY,searchId,smallKeyword searchId 4 const 8 Using where; Using temporary 
1 SIMPLE glossary_entry_wordList_1 ref PRIMARY,termTransliteralRFC,termLength PRIMARY 3 foundIDs.similarKeywordId 146 Using where 

引擎行为是这样的: - 字长越小(1-2 个字母),查询响应越长(显然因为它们有更多的关联) - 搜索表 (foundIds) 中的字越多,查询越长

关于如何改进查询响应的任何想法?

谢谢,

4

0 回答 0