sql - 优化在两个单独的步骤中运行时快速的慢速 SQL 查询

Question

我在优化以下 SQL 查询时遇到了麻烦（使用 postgresql 9.1）：

WITH regions AS (
    SELECT r1.region_id
      FROM region r1, 
           (SELECT * 
              FROM region 
             WHERE region_id = 1) r2
     WHERE (r1.region_country = r2.region_country
             OR r2.region_country = 0) 
       AND (r1.region_province = r2.region_province 
             OR r2.region_province = 0) 
       AND (r1.region_area = r2.region_area 
             OR r2.region_area = 0))

SELECT id 
  FROM users 
 WHERE user_region in (SELECT region_id 
                         FROM regions);

解释产生以下输出

Nested Loop  (cost=85.02..42405.93 rows=13217 width=4) (actual time=0.447..970.132 rows=527444 loops=1)                                                                                                                          
  Buffers: shared hit=464136                                                                                                                                                                                                     
  CTE regions                                                                                                                                                                                                                    
    ->  Nested Loop  (cost=0.00..32.11 rows=5 width=4) (actual time=0.029..0.237 rows=135 loops=1)                                                                                                                               
          Join Filter: (((r1.region_country = region.region_country) OR (region.region_country = 0)) AND ((r1.region_province = region.region_province) OR (region.region_province = 0)) AND ((r1.region_area = region.region_area) OR (region.region_area = 0))) 
          Buffers: shared hit=7                                                                                                                                                                                                  
          ->  Index Scan using region_pkey on region  (cost=0.00..8.27 rows=1 width=6) (actual time=0.015..0.016 rows=1 loops=1)                                                                                                 
                Index Cond: (re_nr = 1)                                                                                                                                                                                          
                Buffers: shared hit=3                                                                                                                                                                                            
          ->  Seq Scan on region r1  (cost=0.00..9.67 rows=567 width=10) (actual time=0.007..0.072 rows=567 loops=1)                                                                                                             
                Buffers: shared hit=4                                                                                                                                                                                            
  ->  HashAggregate  (cost=0.11..0.16 rows=5 width=4) (actual time=0.326..0.449 rows=135 loops=1)                                                                                                                                
        Buffers: shared hit=7                                                                                                                                                                                                    
        ->  CTE Scan on regions  (cost=0.00..0.10 rows=5 width=4) (actual time=0.032..0.278 rows=135 loops=1)                                                                                                                    
              Buffers: shared hit=7                                                                                                                                                                                              
  ->  Bitmap Heap Scan on users  (cost=52.79..8441.69 rows=2643 width=8) (actual time=1.442..6.459 rows=3907 loops=135)                                                                                                   
        Recheck Cond: (user_region = regions.region_id)                                                                                                                                                                            
        Buffers: shared hit=464129                                                                                                                                                                                               
        ->  Bitmap Index Scan on user_region  (cost=0.00..52.13 rows=2643 width=0) (actual time=0.675..0.675 rows=3909 loops=135)                                                                                              
              Index Cond: (user_region = regions.region_id)                                                                                                                                                                        
              Buffers: shared hit=1847                                                                                                                                                                                           
Total runtime: 1003.867 ms

如果我只是添加区域查询的输出，那么一切都和预期的一样快。

SELECT id
FROM users
WHERE user_region in (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110)

解释产生以下输出。

Bitmap Heap Scan on users  (cost=5643.57..135774.21 rows=322812 width=4) (actual time=138.339..365.676 rows=527444 loops=1)                                                                                                                                                                                                                                                                                                                                                                         
  Recheck Cond: (user_region = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110}'::integer[]))     
  Buffers: shared hit=72973 read=1302                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
  ->  Bitmap Index Scan on user_region  (cost=0.00..5562.86 rows=322812 width=0) (actual time=114.446..114.446 rows=527752 loops=1)                                                                                                                                                                                                                                                                                                                                                                      
        Index Cond: (user_region = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110}'::integer[])) 
        Buffers: shared hit=546 read=1301                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
Total runtime: 397.975 ms

自行计算区域查询也非常快。

Nested Loop  (cost=0.00..32.11 rows=5 width=4) (actual time=0.059..12.323 rows=135 loops=1)                                                                                                                              
  Join Filter: (((r1.region_country = region.region_country) OR (region.region_country = 0)) AND ((r1.region_province = region.region_province) OR (region.region_province = 0)) AND ((r1.region_area = region.region_area) OR (region.region_area = 0))) 
  Buffers: shared hit=1 read=6                                                                                                                                                                                           
  ->  Index Scan using region_pkey on region  (cost=0.00..8.27 rows=1 width=6) (actual time=0.044..0.046 rows=1 loops=1)                                                                                                 
        Index Cond: (re_nr = 1)                                                                                                                                                                                          
        Buffers: shared read=3                                                                                                                                                                                           
  ->  Seq Scan on region r1  (cost=0.00..9.67 rows=567 width=10) (actual time=0.005..12.122 rows=567 loops=1)                                                                                                            
        Buffers: shared hit=1 read=3                                                                                                                                                                                     
Total runtime: 12.379 ms

如果我将更多列添加到select from users.

有没有办法在一个快速查询中计算所有内容？

任何帮助或解决方案的指针都非常感谢。

[编辑]根据评论中的要求添加区域表示例用户可以选择一个区域（user_region），可以是国家、省或城市/城市的一部分。区域查询尝试查找该国家、省或城市中的所有 region_id。如果用户选择奥地利 (region_id = 1)，则应返回来自奥地利的所有其他 region_id。如果用户选择“下奥地利”（region_id = 26），则应返回下奥地利省的所有地区（在样本数据 27、28、29、30 中）。

select * from region limit 30;
 region_country | region_province | region_area |     region_name     | region_id 
----------------+-----------------+-------------+---------------------+-----------
              1 |               0 |           0 | Austria             |         1
              1 |               1 |           0 | Vienna              |         2
              1 |               1 |           1 | Vienna 1            |         3
              1 |               1 |           2 | Vienna 2            |         4
              1 |               1 |           3 | Vienna 3            |         5
              1 |               1 |           4 | Vienna 4            |         6
              1 |               1 |           5 | Vienna 5            |         7
              1 |               1 |           6 | Vienna 6            |         8
              1 |               1 |           7 | Vienna 7            |         9
              1 |               1 |           8 | Vienna 8            |        10
              1 |               1 |           9 | Vienna 9            |        11
              1 |               1 |          10 | Vienna 10           |        12
              1 |               1 |          11 | Vienna 11           |        13
              1 |               1 |          12 | Vienna 12           |        14
              1 |               1 |          13 | Vienna 13           |        15
              1 |               1 |          14 | Vienna 14           |        16
              1 |               1 |          15 | Vienna 15           |        17
              1 |               1 |          16 | Vienna 16           |        18
              1 |               1 |          17 | Vienna 17           |        19
              1 |               1 |          18 | Vienna 18           |        20
              1 |               1 |          19 | Vienna 19           |        21
              1 |               1 |          20 | Vienna 20           |        22
              1 |               1 |          21 | Vienna 21           |        23
              1 |               1 |          22 | Vienna 22           |        24
              1 |               1 |          23 | Vienna 23           |        25
              1 |               2 |           0 | Lower Austria       |        26
              1 |               2 |           1 | St.Pölten           |        27
              1 |               2 |           2 | Amstetten           |        28
              1 |               2 |           3 | Baden               |        29
              1 |               2 |           4 | Bruck an der Leitha |        30

score 1 · Accepted Answer

Ajoin通常比in从句更有效：

.
.
.
SELECT id FROM users 
  INNER JOIN regions ON user_region = region_id;

假设每个用户只匹配一个区域（从您的查询来看似乎是真的），这将为您提供相同的结果。

score 0 · Accepted Answer

您是否尝试过分析您的表格？

从您发布的解释中，我可以看到 Postgres 预期的行数比实际返回的行数少39 倍。

当 Postgres 的期望与实际结果集有很大差异时，它可以选择次优计划，从而产生较差的查询计划并花费更长的时间来完成查询。

sql - 优化在两个单独的步骤中运行时快速的慢速 SQL 查询

2 回答 2

Related

Reference