1

此查询需要一分钟多的时间才能完成:

SELECT keyword, count(*) as 'Number of Occurences'
    FROM movie_keyword
    JOIN
    keyword
    ON keyword.`id` = movie_keyword.`keyword_id`
    GROUP BY keyword
    ORDER BY count(*) DESC
    LIMIT 5

每个关键字都有一个与之关联的 ID(keyword_id 列)。该 ID 用于从关键字表中查找实际关键字。

movie_keyword 有 280 万行

关键字有 127,000

然而,只返回最常用的关键字 ID 只需要 1 秒:

SELECT keyword_id, count(*)
    FROM movie_keyword
    GROUP BY keyword_id
    ORDER BY count(*) DESC
    LIMIT 5

有没有更有效的方法来做到这一点?

输出解释:

1   SIMPLE  keyword ALL PRIMARY NULL    NULL    NULL    125405  Using temporary; Using filesort
1   SIMPLE  movie_keyword   ref idx_keywordid   idx_keywordid   4   imdb.keyword.id 28  Using index

结构:

CREATE TABLE `movie_keyword` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `movie_id` int(11) NOT NULL,
  `keyword_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_mid` (`movie_id`),
  KEY `idx_keywordid` (`keyword_id`),
  KEY `keyword_ix` (`keyword_id`),
  CONSTRAINT `movie_keyword_keyword_id_exists` FOREIGN KEY (`keyword_id`) REFERENCES `keyword` (`id`),
  CONSTRAINT `movie_keyword_movie_id_exists` FOREIGN KEY (`movie_id`) REFERENCES `title` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4256379 DEFAULT CHARSET=latin1;

CREATE TABLE `keyword` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `keyword` text NOT NULL,
  `phonetic_code` varchar(5) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_keyword` (`keyword`(5)),
  KEY `idx_pcode` (`phonetic_code`),
  KEY `keyword_ix` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=127044 DEFAULT CHARSET=latin1;
4

3 回答 3

1

未经测试,但在我看来应该可以工作并且明显更快,虽然不太确定是否允许在 mysql 的子查询中使用限制,但还有其他方法可以解决这个问题。

SELECT keyword, count(*) as 'Number of Occurences'
    FROM movie_keyword
    JOIN
    keyword
    ON keyword.`id` = movie_keyword.`keyword_id`
    WHERE movie_keyword.keyword_id IN (
        SELECT keyword_id
        FROM movie_keyword
        GROUP BY keyword
        ORDER BY count(*) DESC    
        LIMIT 5
    )
    GROUP BY keyword
    ORDER BY count(*) DESC;

这应该更快,因为您没有将 movie_keyword 中的所有 280 万个条目与关键字连接起来,而只是那些实际匹配的条目,我猜这要少得多。

编辑,因为 mysql 不支持您必须运行的子查询中的限制

SELECT keyword_id
FROM movie_keyword
GROUP BY keyword
ORDER BY count(*) DESC    
LIMIT 5;

首先,在获取结果之后运行第二个查询

SELECT keyword, count(*) as 'Number of Occurences'
    FROM movie_keyword
    JOIN
    keyword
    ON keyword.`id` = movie_keyword.`keyword_id`
    WHERE movie_keyword.keyword_id IN (RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS)
    GROUP BY keyword
    ORDER BY count(*) DESC;

RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS以编程方式从您使用的任何语言替换为正确的值

于 2012-10-09T03:17:50.300 回答
0

查询似乎很好,但我认为结构不是,尝试在列上给出索引

keyword.id

尝试,

CREATE INDEX keyword_ix ON keyword (id);

或者

ALTER TABLE keyword ADD INDEX keyword_ix (id);

如果您可以发布表格的结构会更好:keywordMovie_keyword. 两者哪个是主表和引用表?

SELECT keyword, count(movie_keyword.id) as 'Number of Occurences'
FROM movie_keyword
     INNER JOIN  keyword
           ON keyword.`id` = movie_keyword.`keyword_id`
GROUP BY keyword
ORDER BY 'Number of Occurences' DESC
LIMIT 5
于 2012-10-09T02:22:50.557 回答
0

我知道这是一个很老的问题,但是因为我认为 xception 忘记了 mysql 中的交付表,所以我想提出另一种解决方案。它只需要一个查询,并且省略了大数据的连接。如果有人拥有如此大的数据并且可以对其进行测试(也许是问题创建者),请分享结果。

SELECT keyword.keyword, _temp.occurences
FROM (
  SELECT keyword_id, COUNT( keyword_id ) AS occurences
  FROM movie_keyword
  GROUP BY keyword_id
  ORDER BY occurences DESC 
  LIMIT 5
) AS _temp
JOIN keyword ON _temp.keyword_id = keyword.id
ORDER BY _temp.occurences DESC
于 2014-10-14T14:55:09.597 回答