0

这让我发疯了。我已经使用 imdbpy 转储了 imdb db。我正在尝试查找通过电影的第一个字母提供演员数据的美国电影。

下面是一个查询示例,它在没有 acto 数据的情况下获取电影。这运行得非常快:

SELECT DISTINCT title.id,title.title,title.production_year
FROM  title

INNER JOIN movie_info ON
(movie_info.movie_id =  title.id
AND
movie_info.info_type_id = 8
AND
movie_info.info =  'USA') 

WHERE  title LIKE  'a%'
AND  title.kind_id =  1
LIMIT 75

演员表数据存储在一个名为cast_info并包含大约 2200 万条记录的单独表中。该nr_order列包含电影中演员的演职员表顺序。例如,汤姆汉克在《阿甘正传》中是 1。每个 . 通常有几十行movie_id

因此,要检查参与者数据是否可用,应该至少有一行对于该特定的movie_id. 如果 a 中的所有值都nr_ordermovie_id空,则它不包含我需要的数据。

要尝试获取此信息,请使用以下查询:

SELECT DISTINCT title.id,title.title,title.production_year
FROM  title

INNER JOIN movie_info ON
(movie_info.movie_id =  title.id
AND
movie_info.info_type_id = 8
AND
movie_info.info =  'USA') 

INNER JOIN cast_info ON 
(cast_info.movie_id = title.id
AND
cast_info.nr_order = 1)

WHERE  title LIKE  'a%'
AND  title.kind_id =  1
LIMIT 75

由于某种原因,查询变得非常慢。第一个查询需要 0.3-.7 秒,第二个查询大约需要 6-10 秒。我在cast_info. nr_order但这没有帮助。

解释输出:

+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
| id | select_type | table     | type  | possible_keys                                    | key               | key_len | ref          | rows  | Extra                       |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
|  1 | SIMPLE      | title     | range | PRIMARY,title_idx_title,fk_kind_type_id_4        |  title_idx_title  | 257     | NULL         | 132801| Using where; Using temporary|
|  1 | SIMPLE      | movie_info| ref   | ovie_info_idx_mid,info_type_id movie_info_idx_mid| movie_info_idx_mid| 4       | imdb.title.id| 4     | Using where; Distinct       |
|  1 | SIMPLE      | table1    | ref   | cast_info_idx_mid,nr_order                       | cast_info_idx_mid | 4       | imdb.title.id| 12    | Using where; Distinct       |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+

任何想法都会非常有帮助!

编辑:从第一个查询中解释

+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
| id | select_type | table     | type  | possible_keys                                    | key               | key_len | ref          | rows  | Extra                       |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
|  1 | SIMPLE      | title     | range | PRIMARY,title_idx_title,fk_kind_type_id_4        |  title_idx_title  | 257     | NULL         | 132801| Using where; Using temporary|
|  1 | SIMPLE      | movie_info| ref   | ovie_info_idx_mid,info_type_id movie_info_idx_mid| movie_info_idx_mid| 4       | imdb.title.id| 4     | Using where; Distinct       |
+----+-------------+-----------+-------+--------------------------------------------------+-------------------+---------+--------------+-------+-----------------------------+
4

1 回答 1

1

由于您只关心是否有可用演员表信息,您可以尝试使用EXISTS

SELECT DISTINCT title.id,title.title,title.production_year
FROM  title

INNER JOIN movie_info ON
(movie_info.movie_id =  title.id
AND
movie_info.info_type_id = 8
AND
movie_info.info =  'USA') 

WHERE  title LIKE  'a%'
AND  title.kind_id =  1
AND EXISTS(SELECT 1 FROM cast_info WHERE cast_info.movie_id = title.id AND cast_info.nr_order IS NOT NULL)
LIMIT 75

我不确定您的行为的确切解释,但DISTINCT可能会在连接上有很多行 - 或者连接产品上至少有很多行 - 做一些有趣的事情 - (注意 Distinct 被应用于 cast_info 表中解释)。

于 2012-10-26T17:39:05.817 回答