sql - 使用计数和索引优化查询

Question

我有一个包含超过 200,000,000 个元组的表，我经常必须运行以下查询并在网页中显示结果，这需要很长时间：

select distinct(source), count(hitid) from tb_hit group by source;

我已经创建了一个索引，但查询不使用它：

CREATE INDEX tb_hit_idx_5 on tb_hit USING btree (HitId ASC,Source ASC);

查询计划在这里：

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=10702925.57..10702925.62 rows=6 width=13) (actual time=330574.690..330574.705 rows=7 loops=1)
   ->  Sort  (cost=10702925.57..10702925.59 rows=6 width=13) (actual time=330574.689..330574.691 rows=7 loops=1)
         Sort Key: source, (count(hitid))
         Sort Method: quicksort  Memory: 25kB
         ->  Finalize GroupAggregate  (cost=10702919.26..10702925.50 rows=6 width=13) (actual time=330574.507..330574.647 rows=7 loops=1)
               Group Key: source
               ->  Gather Merge  (cost=10702919.26..10702925.20 rows=48 width=13) (actual time=330574.454..330588.679 rows=63 loops=1)
                     Workers Planned: 8
                     Workers Launched: 8
                     ->  Sort  (cost=10701919.12..10701919.13 rows=6 width=13) (actual time=330561.376..330561.378 rows=7 loops=9)
                           Sort Key: source
                           Sort Method: quicksort  Memory: 25kB
                           Worker 0:  Sort Method: quicksort  Memory: 25kB
                           Worker 1:  Sort Method: quicksort  Memory: 25kB
                           Worker 2:  Sort Method: quicksort  Memory: 25kB
                           Worker 3:  Sort Method: quicksort  Memory: 25kB
                           Worker 4:  Sort Method: quicksort  Memory: 25kB
                           Worker 5:  Sort Method: quicksort  Memory: 25kB
                           Worker 6:  Sort Method: quicksort  Memory: 25kB
                           Worker 7:  Sort Method: quicksort  Memory: 25kB
                           ->  Partial HashAggregate  (cost=10701918.98..10701919.04 rows=6 width=13) (actual time=330561.260..330561.265 rows=7 loops=9)
                                 Group Key: source
                                 ->  Parallel Seq Scan on tb_hit  (cost=0.00..10523012.32 rows=35781332 width=13) (actual time=4.019..303398.636 rows=31814705 loops=9)

并且，在set enable_seqscan = OFF;这之后是解释的结果：

QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=16625420.17..16625420.22 rows=6 width=13) (actual time=393693.931..393693.940 rows=7 loops=1)
-> Sort (cost=16625420.17..16625420.19 rows=6 width=13) (actual time=393693.929..393693.930 rows=7 loops=1)
Sort Key: source, (count(hitid))
Sort Method: quicksort Memory: 25kB
-> Finalize GroupAggregate (cost=16625413.86..16625420.10 rows=6 width=13) (actual time=393693.825..393693.902 rows=7 loops=1)
Group Key: source
-> Gather Merge (cost=16625413.86..16625419.80 rows=48 width=13) (actual time=393693.784..395576.863 rows=63 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Sort (cost=16624413.72..16624413.73 rows=6 width=13) (actual time=393680.090..393680.092 rows=7 loops=9)
Sort Key: source
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Worker 2: Sort Method: quicksort Memory: 25kB
Worker 3: Sort Method: quicksort Memory: 25kB
Worker 4: Sort Method: quicksort Memory: 25kB
Worker 5: Sort Method: quicksort Memory: 25kB
Worker 6: Sort Method: quicksort Memory: 25kB
Worker 7: Sort Method: quicksort Memory: 25kB
-> Partial HashAggregate (cost=16624413.58..16624413.64 rows=6 width=13) (actual time=393679.954..393679.959 rows=7 loops=9)
Group Key: source
-> Parallel Bitmap Heap Scan on tb_hit (cost=5922341.42..16445455.86 rows=35791544 width=13) (actual time=52043.284..367453.059 rows=31814705 loops=9)
Heap Blocks: exact=1216152
-> Bitmap Index Scan on tb_hit_idx_5 (cost=0.00..5850758.33 rows=286332352 width=0) (actual time=40833.844..40833.844 rows=286332344 loops=1)
Planning Time: 0.366 ms
Execution Time: 395577.824 ms
(27 rows)

score 0 · Accepted Answer

第一：这里DISTINCT是多余的，你应该删除它。已经保证了GROUP BY独特性。

DISTINCT通常是一个性能问题，但这里的情况更简单：绝对行数支配了执行时间。

没有办法读取每一行，索引在这里也帮不了你。

您可以做的是创建一个包含所需结果的汇总表，并在修改基表时由触发器更新，以便计数始终准确。

然后就可以查询那个汇总表了，速度会非常快。您付出的代价是数据修改期间的触发运行时间。

sql - 使用计数和索引优化查询

1 回答 1

Related

Reference