postgresql - 为什么查询计划如此不同？

Question

我有表reviews_article（1508 行）。我不明白，为什么只有不同偏移量的两个查询有不同的查询计划。你能给我解释一下吗？

CREATE TABLE "reviews_article" (
    "id" serial NOT NULL PRIMARY KEY,
    "version_id" integer REFERENCES "reversion_version" ("id") DEFERRABLE INITIALLY DEFERRED,
    "published" boolean NOT NULL,
    "created" timestamp with time zone NOT NULL,
);

CREATE INDEX reviews_article_version_id_7cf1d83d68e3a3c6 
    ON reviews_article ( version_id, published );

CREATE INDEX reviews_article_created_desc_index 
    ON reviews_article ( created DESC NULLS FIRST );

EXPLAIN ANALYZE SELECT ••• FROM "reviews_article" 
LEFT OUTER JOIN "reversion_version" ON ("reviews_article"."version_id" = "reversion_version"."id") 
WHERE ("reversion_version"."id" IS NOT NULL AND "reviews_article"."published" = true ) 
ORDER BY "reviews_article"."created" DESC LIMIT 8 OFFSET 304

有计划：

Limit  (cost=882.89..906.12 rows=8 width=1868) (actual time=3.630..3.717 rows=8 loops=1)
  ->  Nested Loop  (cost=0.00..3319.54 rows=1143 width=1868) (actual time=0.063..3.397 rows=312 loops=1)
        ->  Index Scan using reviews_article_created_desc_index on reviews_article  (cost=0.00..1128.75 rows=1254 width=863) (actual time=0.030..0.926 rows=393 loops=1)
              Filter: published
              Rows Removed by Filter: 6
        ->  Index Scan using reversion_version_pkey on reversion_version  (cost=0.00..1.74 rows=1 width=1005) (actual time=0.003..0.003 rows=1 loops=393)
              Index Cond: ((id = reviews_article.version_id) AND (id IS NOT NULL))
Total runtime: 3.769 ms

但同样的查询OFFSET 312有计划：

Limit  (cost=919.06..919.08 rows=8 width=1868) (actual time=16.974..16.987 rows=8 loops=1)
  ->  Sort  (cost=918.28..921.14 rows=1143 width=1868) (actual time=16.488..16.720 rows=320 loops=1)
        Sort Key: reviews_article.created
        Sort Method: top-N heapsort  Memory: 1154kB
        ->  Hash Join  (cost=369.75..865.01 rows=1143 width=1868) (actual time=4.895..13.286 rows=1136 loops=1)
              Hash Cond: (reversion_version.id = reviews_article.version_id)
              ->  Seq Scan on reversion_version  (cost=0.00..471.87 rows=3187 width=1005) (actual time=0.006..4.041 rows=3195 loops=1)
                    Filter: (id IS NOT NULL)
              ->  Hash  (cost=354.08..354.08 rows=1254 width=863) (actual time=4.877..4.877 rows=1136 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 1057kB
                    ->  Seq Scan on reviews_article  (cost=0.00..354.08 rows=1254 width=863) (actual time=0.005..2.521 rows=1254 loops=1)
                          Filter: published
                          Rows Removed by Filter: 254
Total runtime: 17.058 ms

score 1 · Accepted Answer

通过索引扫描遍历整个表比使用顺序扫描更糟糕。在偏移量 304 和偏移量 312 之间的某个地方，规划器决定它会通过索引执行更多的 I/O 弹跳，而不是使用 seq 扫描咬子弹。

有一些可能性。如前所述，一个是SET enable_seqscan=0;. 另一种可能的解决方案是增加statistics该表的参数并重新分析。（我假设在此测试之前对表格进行了新的分析。）

顺便说一句LIMIT/OFFSET，如果您正在这样做，这种重复是否有利于对结果集进行分页尚不清楚。

postgresql - 为什么查询计划如此不同？

1 回答 1

Related

Reference