我对以下 PostgreSQL 查询有疑问,运行时间超过 10 秒有没有办法将此查询加速到合理的速度,我只是在一个非常大的数据库中寻找与视频相关的最相关的搜索词.
SELECT count(*), videoid
FROM term_search
where word = 'tester'
OR word = 'question'
OR word = 'one'
group by videoid
order by count(*) desc
limit 1800;
当使用分析运行查询时,生成的查询计划如下(http://explain.depesz.com/s/yDJ):
Limit (cost=389625.50..389630.00 rows=1800 width=4) (actual time=11766.693..11770.001 rows=1800 loops=1)
Output: (count(*)), videoid
-> Sort (cost=389625.50..389692.68 rows=26873 width=4) (actual time=11766.689..11767.818 rows=1800 loops=1)
Output: (count(*)), videoid
Sort Key: (count(*))
Sort Method: top-N heapsort Memory: 181kB
-> HashAggregate (cost=387769.41..388038.14 rows=26873 width=4) (actual time=9215.653..10641.993 rows=1632578 loops=1)
Output: count(*), videoid
-> Bitmap Heap Scan on public.term_search (cost=44915.83..378163.38 rows=1921207 width=4) (actual time=312.449..7026.036 rows=2047691 loops=1)
Output: id, videoid, word, termindex, weight
Recheck Cond: (((term_search.word)::text = 'tester'::text) OR ((term_search.word)::text = 'question'::text) OR ((term_search.word)::text = 'one'::text))
Rows Removed by Index Recheck: 25512434
-> BitmapOr (cost=44915.83..44915.83 rows=1950031 width=0) (actual time=288.937..288.937 rows=0 loops=1)
-> Bitmap Index Scan on terms_word_idx (cost=0.00..8552.83 rows=383502 width=0) (actual time=89.266..89.266 rows=419750 loops=1)
Index Cond: ((term_search.word)::text = 'tester'::text)
-> Bitmap Index Scan on terms_word_idx (cost=0.00..13171.84 rows=590836 width=0) (actual time=89.700..89.700 rows=604348 loops=1)
Index Cond: ((term_search.word)::text = 'question'::text)
-> Bitmap Index Scan on terms_word_idx (cost=0.00..21750.26 rows=975693 width=0) (actual time=109.964..109.964 rows=1023593 loops=1)
Index Cond: ((term_search.word)::text = 'one'::text)
该表的架构如下:
Column | Type | Modifiers | Storage | Description
-----------+------------------------+----------------------------------------------------------+----------+-------------
id | integer | not null default nextval('term_search_id_seq'::regclass) | plain |
videoid | integer | | plain |
word | character varying(100) | | extended |
termindex | character varying(15) | | extended |
weight | smallint | | plain |
Indexes:
"term_search_pkey" PRIMARY KEY, btree (id)
"search_term_exists_idx" btree (videoid, word)
"terms_caverphone_idx" btree (termindex)
"terms_video_idx" btree (videoid)
"terms_word_idx" btree (word, videoid)
Foreign-key constraints:
"term_search_videoid_fkey" FOREIGN KEY (videoid) REFERENCES videos(id) ON DELETE CASCADE
Has OIDs: no
我已经设法通过仅索引扫描将其缩短到 7 秒,但仍然不够低。我在 aws r3.xlarge 实例上的 Ubuntu 14.04 上运行 PostgreSQL 9.3,表中有大约 5000 万行。任何意见是极大的赞赏!
编辑:
附件是 SELECT schemaname,tablename,attname,null_frac,avg_width,n_distinct FROM pg_stats WHERE schemaname='public' and tablename='term_search'的结果;
schemaname | tablename | attname | null_frac | avg_width | n_distinct
------------+-------------+-----------+-----------+-----------+------------
public | term_search | id | 0 | 4 | -1
public | term_search | videoid | 0 | 4 | 568632
public | term_search | word | 0 | 6 | 5054
public | term_search | termindex | 0 | 11 | 2485
public | term_search | weight | 0 | 2 | 3