0

我是 postgres 的新手,我有一个关于在经常更新的列上使用部分索引的问题。

我有一张巨大的桌子:工作,其中有以下列。它有近 5000 万行。

表格和索引

CREATE TABLE job
(
  id uuid,
  assigned_at timestamp with time zone,
  completed_at timestamp with time zone
)

assigned_at列将在某人获得该作业时更新,并且该completed_at列将在作业完成时更新。所以表会经常更新。

我试图创建一个部分索引,如下所示:

CREATE INDEX idx ON job (c_id) WHERE ((assigned_at IS NOT NULL) AND (completed_at IS NULL));

更新查询

现在我想清除已分配超过 10 天的作业的分配。这是我的查询。执行需要很长时间:

update table set assigned_at = null where completed_at is null and (now() - assigned_at) > INTERVAL '10 days'

该索引在测试环境中运行良好,但在在线环境中不使用。我想知道在线环境的频繁操作是否阻止了部分索引的使用?以及如何加快更新查询的速度?

如果有人对此有任何想法,将不胜感激。谢谢。

解释分析:

  1. 在测试环境中:
                                                       QUERY PLAN         
-------------------------------------------------------------------------------------------------------------------------------------------------
 Update on job (cost=1592.47..1790362.54 rows=6083164 width=156) (actual time=31.447..31.448 rows=0 loops=1)
  -> Bitmap Heap Scan on job (cost=1592.47..1790362.54 rows=6083164 width=156) (actual time=1.180..6.174 rows=494 loops=1)
        Recheck Cond: ((assigned_at IS NOT NULL) AND (completed_at IS NULL))
        Filter: (assigned_at < (now() - '10 days'::interval))
        Rows Removed by Filter: 2585
        Heap Blocks: exact=2698
        -> Bitmap Index Scan on idx (cost=0.00..71.67 rows=7446475 width=0) (actual time=0.839..0.839 rows=3079 loops=1)
 Planning Time: 0.238 ms
 Execution Time: 31.487 ms
(9 rows)
  1. 在在线环境中:
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Update on job  (cost=0.00..1773961.63 rows=2275384 width=156) (actual time=56346.519..56346.521 rows=0 loops=1)
   ->  Seq Scan on job  (cost=0.00..1773961.63 rows=2275384 width=156) (actual time=0.212..55583.427 rows=693 loops=1)
         Filter: ((assigned_at IS NOT NULL) AND (completed_at IS NULL) AND ((now() - assigned_at) > '10 days'::interval))
         Rows Removed by Filter: 47839353
 Planning Time: 0.640 ms
 Execution Time: 56346.582 ms
4

1 回答 1

0

大概您也希望使用索引在 10 天后停止,而不仅仅是应用 NULL 标准。为此,您需要自己编写带有可索引列的查询,因为 PostgreSQL 不会为您做代数:

update job set assigned_at = null where completed_at is null and assigned_at < now() - INTERVAL '10 days'

你想要这个索引:

CREATE INDEX idx ON job (assigned_at) WHERE ((assigned_at IS NOT NULL) AND (completed_at IS NULL));

但它不在在线环境中使用。

它做了什么?请给出EXPLAIN (ANALYZE, BUFFERS), (它将实际运行 UPDATE)。

于 2021-09-11T13:37:30.643 回答