8

我目前有一个由于 OR 语句而很慢的 postgresql 查询。因为它显然没有使用索引。到目前为止,重写此查询失败。

查询:

EXPLAIN ANALYZE SELECT a0_.id AS id0
FROM   advert a0_
       INNER JOIN advertcategory a1_
               ON a0_.advert_category_id = a1_.id
WHERE  a0_.advert_category_id IN ( 1136 )
        OR a1_.parent_id IN ( 1136 )
ORDER  BY a0_.created_date DESC
LIMIT  15;  

                                                                           QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..27542.49 rows=15 width=12) (actual time=1.658..50.809 rows=15 loops=1)
   ->  Nested Loop  (cost=0.00..1691109.07 rows=921 width=12) (actual time=1.657..50.790 rows=15 loops=1)
         ->  Index Scan Backward using advert_created_date_idx on advert a0_  (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.013..16.449 rows=12405 loops=1)
         ->  Index Scan using advertcategory_pkey on advertcategory a1_  (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405)
               Index Cond: (id = a0_.advert_category_id)
               Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))
               Rows Removed by Filter: 1
 Total runtime: 50.860 ms

慢的原因:Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))

我尝试使用 INNER JOIN 而不是 WHERE 语句:

EXPLAIN ANALYZE  SELECT a0_.id AS id0
FROM   advert a0_
       INNER JOIN advertcategory a1_
               ON a0_.advert_category_id = a1_.id
                  AND ( a0_.advert_category_id IN ( 1136 )
                         OR a1_.parent_id IN ( 1136 ) )
ORDER  BY a0_.created_date DESC
LIMIT  15;  

                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..27542.49 rows=15 width=12) (actual time=4.667..139.955 rows=15 loops=1)
   ->  Nested Loop  (cost=0.00..1691109.07 rows=921 width=12) (actual time=4.666..139.932 rows=15 loops=1)
         ->  Index Scan Backward using advert_created_date_idx on advert a0_  (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.019..100.765 rows=12405 loops=1)
         ->  Index Scan using advertcategory_pkey on advertcategory a1_  (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405)
               Index Cond: (id = a0_.advert_category_id)
               Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))
               Rows Removed by Filter: 1
 Total runtime: 140.048 ms

当我删除 OR 条件之一时,查询速度会加快。所以我做了一个 UNION 来查看结果。它非常快!但我不认为这是一个解决方案:

EXPLAIN ANALYZE 
 (SELECT a0_.id AS id0
 FROM   advert a0_
        INNER JOIN advertcategory a1_
                ON a0_.advert_category_id = a1_.id
 WHERE  a0_.advert_category_id IN ( 1136 )
 ORDER  BY a0_.created_date DESC
 LIMIT  15)
UNION
(SELECT a0_.id AS id0
 FROM   advert a0_
        INNER JOIN advertcategory a1_
                ON a0_.advert_category_id = a1_.id
 WHERE  a1_.parent_id IN ( 1136 )
 ORDER  BY a0_.created_date DESC
 LIMIT  15);  

                                                                               QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=4125.70..4126.00 rows=30 width=12) (actual time=7.945..7.951 rows=15 loops=1)
   ->  Append  (cost=1120.82..4125.63 rows=30 width=12) (actual time=6.811..7.929 rows=15 loops=1)
         ->  Subquery Scan on "*SELECT* 1"  (cost=1120.82..1121.01 rows=15 width=12) (actual time=6.810..6.840 rows=15 loops=1)
               ->  Limit  (cost=1120.82..1120.86 rows=15 width=12) (actual time=6.809..6.825 rows=15 loops=1)
                     ->  Sort  (cost=1120.82..1121.56 rows=295 width=12) (actual time=6.807..6.813 rows=15 loops=1)
                           Sort Key: a0_.created_date
                           Sort Method: top-N heapsort  Memory: 25kB
                           ->  Nested Loop  (cost=10.59..1113.59 rows=295 width=12) (actual time=1.151..6.639 rows=220 loops=1)
                                 ->  Index Only Scan using advertcategory_pkey on advertcategory a1_  (cost=0.00..8.27 rows=1 width=4) (actual time=1.030..1.033 rows=1 loops=1)
                                       Index Cond: (id = 1136)
                                       Heap Fetches: 1
                                 ->  Bitmap Heap Scan on advert a0_  (cost=10.59..1102.37 rows=295 width=16) (actual time=0.099..5.287 rows=220 loops=1)
                                       Recheck Cond: (advert_category_id = 1136)
                                       ->  Bitmap Index Scan on idx_54f1f40bd4436821  (cost=0.00..10.51 rows=295 width=0) (actual time=0.073..0.073 rows=220 loops=1)
                                             Index Cond: (advert_category_id = 1136)
         ->  Subquery Scan on "*SELECT* 2"  (cost=3004.43..3004.62 rows=15 width=12) (actual time=1.072..1.072 rows=0 loops=1)
               ->  Limit  (cost=3004.43..3004.47 rows=15 width=12) (actual time=1.071..1.071 rows=0 loops=1)
                     ->  Sort  (cost=3004.43..3005.99 rows=626 width=12) (actual time=1.069..1.069 rows=0 loops=1)
                           Sort Key: a0_.created_date
                           Sort Method: quicksort  Memory: 25kB
                           ->  Nested Loop  (cost=22.91..2989.07 rows=626 width=12) (actual time=1.056..1.056 rows=0 loops=1)
                                 ->  Index Scan using idx_d84ab8ea727aca70 on advertcategory a1_  (cost=0.00..8.27 rows=1 width=4) (actual time=1.054..1.054 rows=0 loops=1)
                                       Index Cond: (parent_id = 1136)
                                 ->  Bitmap Heap Scan on advert a0_  (cost=22.91..2972.27 rows=853 width=16) (never executed)
                                       Recheck Cond: (advert_category_id = a1_.id)
                                       ->  Bitmap Index Scan on idx_54f1f40bd4436821  (cost=0.00..22.70 rows=853 width=0) (never executed)
                                             Index Cond: (advert_category_id = a1_.id)
 Total runtime: 8.940 ms
(28 rows)

尝试反转 IN 语句:

EXPLAIN ANALYZE  SELECT a0_.id AS id0
FROM   advert a0_
       INNER JOIN advertcategory a1_
               ON a0_.advert_category_id = a1_.id
WHERE  1136 IN ( a0_.advert_category_id, a1_.parent_id )
ORDER  BY a0_.created_date DESC
LIMIT  15;  

                                                                               QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..27542.49 rows=15 width=12) (actual time=1.848..62.461 rows=15 loops=1)
   ->  Nested Loop  (cost=0.00..1691109.07 rows=921 width=12) (actual time=1.847..62.441 rows=15 loops=1)
         ->  Index Scan Backward using advert_created_date_idx on advert a0_  (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.028..27.316 rows=12405 loops=1)
         ->  Index Scan using advertcategory_pkey on advertcategory a1_  (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405)
               Index Cond: (id = a0_.advert_category_id)
               Filter: ((1136 = a0_.advert_category_id) OR (1136 = parent_id))
               Rows Removed by Filter: 1
 Total runtime: 62.506 ms
(8 rows)

尝试使用 EXISTS:

EXPLAIN ANALYZE  SELECT a0_.id AS id0
FROM   advert a0_
       INNER JOIN advertcategory a1_
               ON a0_.advert_category_id = a1_.id
WHERE  EXISTS(SELECT test.id
              FROM   advert test
                     INNER JOIN advertcategory test_cat
                             ON test_cat.id = test.advert_category_id
              WHERE  test.id = a0_.id
                     AND ( test.advert_category_id IN ( 1136 )
                            OR test_cat.parent_id IN ( 1136 ) ))
ORDER  BY a0_.created_date DESC
LIMIT  15;  

                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=45538.18..45538.22 rows=15 width=12) (actual time=524.654..524.673 rows=15 loops=1)
   ->  Sort  (cost=45538.18..45540.48 rows=921 width=12) (actual time=524.651..524.658 rows=15 loops=1)
         Sort Key: a0_.created_date
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Hash Join  (cost=39803.59..45515.58 rows=921 width=12) (actual time=497.362..524.436 rows=220 loops=1)
               Hash Cond: (a0_.advert_category_id = a1_.id)
               ->  Nested Loop  (cost=39786.88..45486.21 rows=921 width=16) (actual time=496.748..523.501 rows=220 loops=1)
                     ->  HashAggregate  (cost=39786.88..39796.09 rows=921 width=4) (actual time=496.705..496.872 rows=220 loops=1)
                           ->  Hash Join  (cost=16.71..39784.58 rows=921 width=4) (actual time=1.210..496.294 rows=220 loops=1)
                                 Hash Cond: (test.advert_category_id = test_cat.id)
                                 Join Filter: ((test.advert_category_id = 1136) OR (test_cat.parent_id = 1136))
                                 Rows Removed by Join Filter: 353584
                                 ->  Seq Scan on advert test  (cost=0.00..33134.04 rows=353804 width=8) (actual time=0.002..177.953 rows=353804 loops=1)
                                 ->  Hash  (cost=9.65..9.65 rows=565 width=8) (actual time=0.622..0.622 rows=565 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 22kB
                                       ->  Seq Scan on advertcategory test_cat  (cost=0.00..9.65 rows=565 width=8) (actual time=0.005..0.327 rows=565 loops=1)
                     ->  Index Scan using advert_pkey on advert a0_  (cost=0.00..6.17 rows=1 width=16) (actual time=0.117..0.118 rows=1 loops=220)
                           Index Cond: (id = test.id)
               ->  Hash  (cost=9.65..9.65 rows=565 width=4) (actual time=0.604..0.604 rows=565 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 20kB
                     ->  Seq Scan on advertcategory a1_  (cost=0.00..9.65 rows=565 width=4) (actual time=0.010..0.285 rows=565 loops=1)
 Total runtime: 524.797 ms

广告表(精简):

353804 rows
                                                                   Table "public.advert"
           Column            |              Type              |                      Modifiers                      | Storage  | Stats target | Description 
-----------------------------+--------------------------------+-----------------------------------------------------+----------+--------------+-------------
 id                          | integer                        | not null default nextval('advert_id_seq'::regclass) | plain    |              | 
 advert_category_id          | integer                        | not null                                            | plain    |              | 
Indexes:
    "idx_54f1f40bd4436821" btree (advert_category_id)
    "advert_created_date_idx" btree (created_date)
Foreign-key constraints:
    "fk_54f1f40bd4436821" FOREIGN KEY (advert_category_id) REFERENCES advertcategory(id) ON DELETE RESTRICT
Has OIDs: no

类别表(精简):

565 rows

                           Table "public.advertcategory"
  Column   |  Type   |                          Modifiers                          
-----------+---------+-------------------------------------------------------------
 id        | integer | not null default nextval('advertcategory_id_seq'::regclass)
 parent_id | integer | 
 active    | boolean | not null
 system    | boolean | not null
Indexes:
    "advertcategory_pkey" PRIMARY KEY, btree (id)
    "idx_d84ab8ea727aca70" btree (parent_id)
Foreign-key constraints:
    "fk_d84ab8ea727aca70" FOREIGN KEY (parent_id) REFERENCES advertcategory(id) ON DELETE RESTRICT

简短的服务器配置:

                                                   version                                                    
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit

            name            |  current_setting   |        source        
----------------------------+--------------------+----------------------
 shared_buffers             | 1800MB             | configuration file
 work_mem                   | 4MB                | configuration file

如您所见,没有一个合适的解决方案可以提高速度。只有拆分 OR 语句的 UNION 解决方案才能提高性能。但我不能使用它,因为这个查询是通过我的 ORM 框架与许多其他过滤器选项一起使用的。另外,如果我能做到这一点,为什么优化器不这样做呢?这似乎是一个非常简单的优化。

对此有任何提示吗?解决这个小问题将不胜感激!

4

2 回答 2

4

全新的方法。您的where情况在两张桌子上,但这似乎没有必要。

第一个变化是:

where a1_.id = 1136 or a1_.parent_id = 1136

我认为您想要的结构是对类别表进行扫描,然后从广告表中获取。为了提供帮助,您可以在advert(advert_category_id, created_date).

我很想通过将where子句移动到子查询中来编写查询。我不知道这是否会影响性能:

SELECT a0_.id AS id0
FROM   advert a0_ INNER JOIN
       (select ac.*
        from advertcategory ac
        where ac.id = 1136 or ac.parent_id = 1136
       ) ac
       ON a0_.advert_category_id = ac.id
ORDER  BY a0_.created_date DESC
LIMIT  15;  
于 2013-05-29T14:28:42.590 回答
3

“另外,如果我能做到这一点,为什么优化器不做呢?” -- 因为在各种情况下它不一定有效(由于子查询中的聚合)或有趣(由于更好的索引)这样做。

您可能会得到的最佳查询计划在 Gordon 的回答中给出,使用union all而不是union避免排序(我认为一个类别永远不是它自己的父级,消除了任何重复的可能性)。

否则,请注意您的查询可能会这样重写:

SELECT a0_.id AS id0
FROM   advert a0_
       INNER JOIN advertcategory a1_
               ON a0_.advert_category_id = a1_.id
WHERE  a1_.id IN ( 1136 )
        OR a1_.parent_id IN ( 1136 )
ORDER  BY a0_.created_date DESC
LIMIT  15;

换句话说,您正在根据一个表中的条件进行过滤,并根据另一个表进行排序/限制。您编写它的方式使您无法使用良好的索引,因为规划器没有认识到过滤条件都来自同一个表,因此它将使用您当前正在做的过滤器在 created_date 上嵌套循环. 请注意,这不是一个糟糕的计划……如果例如 1136 不是非常有选择性的标准,这实际上是正确的做法。

通过明确表示第二个表是感兴趣的表,当类别具有足够的选择性时,如果您有索引advertcategory (id)(如果它是主键,您已经有索引)和advertcategory (parent_id)(您目前可能没有)。不过不要太指望它——据我所知,PG 不会收集相关的列信息。

另一种可能性可能是直接在 advert 中维护一个包含聚合类别(使用触发器)的数组,并在其上使用 GIST 索引:

SELECT a0_.id AS id0
FROM   advert a0_
WHERE  ARRAY[1136, 1137] && a0_.category_ids -- 1136 OR 1137; use <@ for AND
ORDER  BY a0_.created_date DESC
LIMIT  15;

它在技术上是多余的,但它可以很好地优化这种查询(即过滤产生复杂连接条件的嵌套类别树)......当 PG 决定使用它时,您最终会得到 top-n 排序的适用广告。(在较旧的 PG 版本中,&& 的选择性是任意的,因为缺少统计信息;我隐约记得阅读过一个变更日志,其中 9.1、9.2 或 9.3 改进了一些东西,大概是通过使用类似于 tsvector 内容统计信息收集器用于通用数组类型的代码. 无论如何,请务必使用最新的 PG 版本,并确保不要使用gin/gist索引无法使用的运算符重写该查询。)

于 2013-05-29T14:28:51.360 回答