sql - Postgresql 中不可预测的查询性能

Question

我在 Postgres 9.3 数据库中有这样的表：

A <1---n B n---1> C

表 A 包含约 10^7 行，表 B 相当大，约 10^9 行，C 包含约 100 行。

我使用以下查询来查找与 B 和 C 中的某些条件匹配的所有 As（不同）（真正的查询更复杂，连接更多表并检查子查询中的更多属性）：

查询一：

explain analyze
select A.SNr from A
where exists (select 1 from B, C
              where B.AId = A.Id and
                    B.CId = C.Id and
                    B.Timestamp >= '2013-01-01' and
                    B.Timestamp <= '2013-01-12' and
                    C.Name = '00000015')
limit 200;

该查询大约需要 500 毫秒（请注意，表中存在 C.Name = '00000015'）：

Limit  (cost=119656.37..120234.06 rows=200 width=9) (actual time=427.799..465.485 rows=200 loops=1)
  ->  Hash Semi Join  (cost=119656.37..483518.78 rows=125971 width=9) (actual time=427.797..465.460 rows=200 loops=1)
        Hash Cond: (a.id = b.aid)
        ->  Seq Scan on a  (cost=0.00..196761.34 rows=12020034 width=13) (actual time=0.010..15.058 rows=133470 loops=1)
        ->  Hash  (cost=117588.73..117588.73 rows=125971 width=4) (actual time=427.233..427.233 rows=190920 loops=1)
              Buckets: 4096  Batches: 8  Memory Usage: 838kB
              ->  Nested Loop  (cost=0.57..117588.73 rows=125971 width=4) (actual time=0.176..400.326 rows=190920 loops=1)
                    ->  Seq Scan on c  (cost=0.00..2.88 rows=1 width=4) (actual time=0.015..0.030 rows=1 loops=1)
                          Filter: (name = '00000015'::text)
                          Rows Removed by Filter: 149
                    ->  Index Only Scan using cid_aid on b  (cost=0.57..116291.64 rows=129422 width=8) (actual time=0.157..382.896 rows=190920 loops=1)
                          Index Cond: ((cid = c.id) AND ("timestamp" >= '2013-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-01-12 00:00:00'::timestamp without time zone))
                          Heap Fetches: 0
Total runtime: 476.173 ms

查询 2：将 C.Name 更改为不存在的东西 (C.Name = 'foo') 需要 0.1 毫秒：

explain analyze
select A.SNr from A
where exists (select 1 from B, C
              where B.AId = A.Id and
                    B.CId = C.Id and
                    B.Timestamp >= '2013-01-01' and
                    B.Timestamp <= '2013-01-12' and
                    C.Name = 'foo')
limit 200;

Limit  (cost=119656.37..120234.06 rows=200 width=9) (actual time=0.063..0.063 rows=0 loops=1)
  ->  Hash Semi Join  (cost=119656.37..483518.78 rows=125971 width=9) (actual time=0.062..0.062 rows=0 loops=1)
        Hash Cond: (a.id = b.aid)
        ->  Seq Scan on a  (cost=0.00..196761.34 rows=12020034 width=13) (actual time=0.010..0.010 rows=1 loops=1)
        ->  Hash  (cost=117588.73..117588.73 rows=125971 width=4) (actual time=0.038..0.038 rows=0 loops=1)
              Buckets: 4096  Batches: 8  Memory Usage: 0kB
              ->  Nested Loop  (cost=0.57..117588.73 rows=125971 width=4) (actual time=0.038..0.038 rows=0 loops=1)
                    ->  Seq Scan on c  (cost=0.00..2.88 rows=1 width=4) (actual time=0.037..0.037 rows=0 loops=1)
                          Filter: (name = 'foo'::text)
                          Rows Removed by Filter: 150
                    ->  Index Only Scan using cid_aid on b  (cost=0.57..116291.64 rows=129422 width=8) (never executed)
                          Index Cond: ((cid = c.id) AND ("timestamp" >= '2013-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-01-12 00:00:00'::timestamp without time zone))
                          Heap Fetches: 0
Total runtime: 0.120 ms

查询 3：将 C.Name 重置为存在的内容（如在第一个查询中）并将时间戳增加 3 天使用另一个查询计划，但仍然很快（200 毫秒）：

explain analyze
select A.SNr from A
where exists (select 1 from B, C
              where B.AId = A.Id and
                    B.CId = C.Id and
                    B.Timestamp >= '2013-01-01' and
                    B.Timestamp <= '2013-01-15' and
                    C.Name = '00000015')
limit 200;

Limit  (cost=0.57..112656.93 rows=200 width=9) (actual time=4.404..227.569 rows=200 loops=1)
  ->  Nested Loop Semi Join  (cost=0.57..90347016.34 rows=160394 width=9) (actual time=4.403..227.544 rows=200 loops=1)
        ->  Seq Scan on a  (cost=0.00..196761.34 rows=12020034 width=13) (actual time=0.008..1.046 rows=12250 loops=1)
        ->  Nested Loop  (cost=0.57..7.49 rows=1 width=4) (actual time=0.017..0.017 rows=0 loops=12250)
              ->  Seq Scan on c  (cost=0.00..2.88 rows=1 width=4) (actual time=0.005..0.015 rows=1 loops=12250)
                    Filter: (name = '00000015'::text)
                    Rows Removed by Filter: 147
              ->  Index Only Scan using cid_aid on b  (cost=0.57..4.60 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12250)
                    Index Cond: ((cid = c.id) AND (aid = a.id) AND ("timestamp" >= '2013-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-01-15 00:00:00'::timestamp without time zone))
                    Heap Fetches: 0
Total runtime: 227.632 ms

查询 4：但是在搜索不存在的 C.Name 时，新的查询计划完全失败：：

explain analyze
select A.SNr from A
where exists (select 1 from B, C
              where B.AId = A.Id and
                    B.CId = C.Id and
                    B.Timestamp >= '2013-01-01' and
                    B.Timestamp <= '2013-01-15' and
                    C.Name = 'foo')
limit 200;

现在返回相同的 0 行需要 170秒（与之前的 0.1 毫秒相比！）：

Limit  (cost=0.57..112656.93 rows=200 width=9) (actual time=170184.979..170184.979 rows=0 loops=1)
  ->  Nested Loop Semi Join  (cost=0.57..90347016.34 rows=160394 width=9) (actual time=170184.977..170184.977 rows=0 loops=1)
        ->  Seq Scan on a  (cost=0.00..196761.34 rows=12020034 width=13) (actual time=0.008..794.626 rows=12020034 loops=1)
        ->  Nested Loop  (cost=0.57..7.49 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=12020034)
              ->  Seq Scan on c  (cost=0.00..2.88 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=12020034)
                    Filter: (name = 'foo'::text)
                    Rows Removed by Filter: 150
              ->  Index Only Scan using cid_aid on b  (cost=0.57..4.60 rows=1 width=8) (never executed)
                    Index Cond: ((cid = c.id) AND (aid = a.id) AND ("timestamp" >= '2013-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-01-15 00:00:00'::timestamp without time zone))
                    Heap Fetches: 0
Total runtime: 170185.033 ms

所有查询都在“更改表集统计信息”之后运行，所有列的值为 10000，并且在整个数据库上运行分析之后。

现在看起来参数的最轻微变化（甚至不是 SQL）都会使 Postgres 选择一个糟糕的计划（在这种情况下为 0.1ms 与 170 秒！）。在更改内容时，我总是尝试检查查询计划，但是当参数上的微小更改会产生如此巨大的差异时，很难确定某些东西会起作用。我对其他查询也有类似的问题。

我可以做些什么来获得更可预测的结果？

（我尝试修改某些查询计划参数（设置 enable_... = on/off）和一些不同的 SQL 语句 - join+distinct/group by 而不是“exists” - 但似乎没有什么能让 postgres 选择“稳定”的查询计划同时仍然提供可接受的性能）。

编辑 #1：表 + 索引定义

test=# \d a
                          Tabelle äpublic.aô
 Spalte |   Typ   |                     Attribute
--------+---------+----------------------------------------------------
 id     | integer | not null Vorgabewert nextval('a_id_seq'::regclass)
 anr    | integer |
 snr    | text    |
Indexe:
    "a_pkey" PRIMARY KEY, btree (id)
    "anr_snr_index" UNIQUE, btree (anr, snr)
    "anr_index" btree (anr)
Fremdschlnssel-Constraints:
    "anr_fkey" FOREIGN KEY (anr) REFERENCES pt(id)
Fremdschlnsselverweise von:
    TABLE "b" CONSTRAINT "aid_fkey" FOREIGN KEY (aid) REFERENCES a(id)


test=# \d b
                 Tabelle äpublic.bô
  Spalte   |             Typ             | Attribute
-----------+-----------------------------+-----------
 id        | uuid                        | not null
 timestamp | timestamp without time zone |
 cid       | integer                     |
 aid       | integer                     |
 prop1     | text                        |
 propn     | integer                     |
Indexe:
    "b_pkey" PRIMARY KEY, btree (id)
    "aid_cid" btree (aid, cid)
    "cid_aid" btree (cid, aid, "timestamp")
    "timestamp_index" btree ("timestamp")
Fremdschlnssel-Constraints:
    "aid_fkey" FOREIGN KEY (aid) REFERENCES a(id)
    "cid_fkey" FOREIGN KEY (cid) REFERENCES c(id)


test=# \d c
                          Tabelle äpublic.cô
 Spalte |   Typ   |                     Attribute
--------+---------+----------------------------------------------------
 id     | integer | not null Vorgabewert nextval('c_id_seq'::regclass)
 name   | text    |
Indexe:
    "c_pkey" PRIMARY KEY, btree (id)
    "c_name_index" UNIQUE, btree (name)
Fremdschlnsselverweise von:
    TABLE "b" CONSTRAINT "cid_fkey" FOREIGN KEY (cid) REFERENCES c(id)

score 1 · Accepted Answer

您的问题是查询需要评估整个表的相关子查询 a。当 Postgres 快速找到 200 个适合的随机行时（当 c.name 存在时似乎偶尔会出现这种情况），它会相应地生成它们，并且如果有很多可供选择的话，速度相当快。但是，当不存在这样的行时，它会在 exists() 语句中评估整个 hogwash 的次数与表 a 中的行一样多，因此您会看到性能问题。

添加一个不相关的 where 子句肯定会修复一些边缘情况：

and exists(select 1 from c where name = ?)

当您将后者与 b 连接并将其编写为 cte 时，它也可能有效：

with bc as (
select aid
from b join c on b.cid = c.bid
and b.timestamp between ? and ?
and c.name = ?
)
select a.id
from a
where exists (select 1 from bc)
and exists (select 1 from bc where a.id = bc.aid)
limit 200

如果没有，只需逐字输入 bc 查询，而不是使用 cte。这里的重点是强制 Postgres 将 bc 查找视为独立的，如果结果集根本没有产生任何行，则尽早放弃。

我假设您的查询最终会更复杂，但请注意，上面的内容可以重写为：

with bc as (...)
select aid
from bc
limit 200

或者：

with bc as (...)
select a.id
from a
where a.id in (select aid from bc)
limit 200

两者都应该在边缘情况下产生更好的计划。

（旁注：通常不建议在不订购的情况下进行限制。）

score 0 · Accepted Answer

也许尝试用 CTE 重写查询？

with BC as (
    select distinct B.AId from B where
    B.Timestamp >= '2013-01-01' and
    B.Timestamp <= '2013-01-12' and
    B.CId in (select C.Id from C where C.Name = '00000015')
    limit 200
)

select A.SNr from A where A.Id in (select AId from BC)

如果我理解正确 - 可以很容易地将限制放在 BC 查询中，以避免对表 A 进行扫描。

sql - Postgresql 中不可预测的查询性能

2 回答 2

Related

Reference