3

我正在尝试减少下面给出的查询的查询执行时间。它连接了 3 个表以从非常大的 Postgres 表中获取数据,我试图在相关表上引入所有必要的索引,但查询仍然花费了太长时间。数据库的总大小约为 2TB。询问:

explain (ANALYZE, COSTS, VERBOSE, BUFFERS)
    with au as (
         select tbl2.client, tbl2.uid
         from tbl2 where tbl2.client = '123kkjk444kjkhj3ddd'
         and  (tbl2.property->>'num') IN ('1', '2', '3', '31', '12a', '45', '78', '99')
             
     )
       SELECT tbl1.id,
            CASE WHEN tbl3.displayname IS NOT NULL THEN tbl3.displayname ELSE tbl1.name END AS name,
            tbl1.tbl3number, tbl3.originalname as orgtbl3
        FROM  table_1 tbl1
        inner JOIN au tbl2 ON tbl2.client = '123kkjk444kjkhj3ddd' AND tbl2.uid = tbl1.uid
        LEFT JOIN tbl3 ON tbl3.client = '123kkjk444kjkhj3ddd' AND tbl3.originalname = tbl1.name
        WHERE tbl1.client = '123kkjk444kjkhj3ddd'
            AND tbl1.date_col BETWEEN '2021-08-01T05:32:40Z' AND '2021-08-29T05:32:40Z'
        ORDER BY tbl1.date_col DESC, tbl1.sid, tbl1.tbl3number
        LIMIT 50000;

我有上面的查询正在运行,但是即使在索引扫描之后查询执行也很慢。我已附上查询计划。查询计划:

->  Limit  (cost=7272.83..7272.86 rows=14 width=158) (actual time=40004.140..40055.737 rows=871 loops=1)
               Output: tbl1.id, (CASE WHEN (tbl3.displayname IS NOT NULL) THEN tbl3.displayname ELSE tbl1.name END), tbl1.tbl3number, tbl3.originalsc
reenname, tbl1.date_col
               Buffers: shared hit=249656881 dirtied=32
               ->  Sort  (cost=7272.83..7272.86 rows=14 width=158) (actual time=40004.139..40055.671 rows=871 loops=1)
                     Output: tbl1.id, (CASE WHEN (tbl3.displayname IS NOT NULL) THEN tbl3.displayname ELSE tbl1.name END), tbl1.tbl3number, tbl3.orig
inalname, tbl1.date_col
                     Sort Key: tbl1.date_col DESC, tbl1.id, tbl1.tbl3number
                     Sort Method: quicksort  Memory: 142kB
                     Buffers: shared hit=249656881 dirtied=32
                     ->  Gather  (cost=1001.39..7272.56 rows=14 width=158) (actual time=9147.574..40055.005 rows=871 loops=1)
                           Output: tbl1.id, (CASE WHEN (tbl3.displayname IS NOT NULL) THEN tbl3.displayname ELSE tbl1.name END), tbl1.tbl3number, scree
n.originalname, tbl1.date_col
                           Workers Planned: 4
                           Workers Launched: 4
                           Buffers: shared hit=249656881 dirtied=32
                           ->  Nested Loop Left Join  (cost=1.39..6271.16 rows=4 width=158) (actual time=3890.074..39998.436 rows=174 loops=5)
                                 Output: tbl1.id, CASE WHEN (tbl3.displayname IS NOT NULL) THEN tbl3.displayname ELSE tbl1.name END, tbl1.tbl3number, s
creen.originalname, tbl1.date_col
                                 Inner Unique: true
                                 Buffers: shared hit=249656881 dirtied=32
                                 Worker 0: actual time=1844.246..39996.744 rows=182 loops=1
                                   Buffers: shared hit=49568277 dirtied=5
                                 Worker 1: actual time=3569.032..39997.124 rows=210 loops=1
                                   Buffers: shared hit=49968461 dirtied=10
                                 Worker 2: actual time=2444.911..39997.561 rows=197 loops=1
                                   Buffers: shared hit=49991521 dirtied=2
                                 Worker 3: actual time=2445.013..39998.065 rows=110 loops=1
                                   Buffers: shared hit=49670445 dirtied=10
                                 ->  Nested Loop  (cost=1.12..6269.94 rows=4 width=610) (actual time=3890.035..39997.924 rows=174 loops=5)
                                       Output: tbl1.id, tbl1.name, tbl1.tbl3number, tbl1.date_col
                                       Inner Unique: true
                                       Buffers: shared hit=249655135 dirtied=32
                                       Worker 0: actual time=1844.200..39996.206 rows=182 loops=1
                                         Buffers: shared hit=49567912 dirtied=5
                                       Worker 1: actual time=3568.980..39996.522 rows=210 loops=1
                                         Buffers: shared hit=49968040 dirtied=10
                                       Worker 2: actual time=2444.872..39996.987 rows=197 loops=1
                                         Buffers: shared hit=49991126 dirtied=2
                                       Worker 3: actual time=2444.965..39997.712 rows=110 loops=1
                                         Buffers: shared hit=49670224 dirtied=10
                                       ->  Parallel Index Only Scan using idx_sv_cuf8_110523 on public.table_1_110523 tbl1  (cost=0.69..5692.16 rows=220 width=692) (actual time=0.059..1458.129 rows=2922506 loops=5)
                                             Output: tbl1.client, tbl1.id, tbl1.tbl3number, tbl1.date_col, tbl1.id, tbl1.name
                                             Index Cond: ((tbl1.client = '123kkjk444kjkhj3ddd'::text) AND (tbl1.date_col >= '2021-08-01 05:32:40+00'::timestamp with time zone) AND (tbl1.date_col <= '2021-08-29 05:32:40+00'::timestamp with time zone))
                                             Heap Fetches: 0
                                             Buffers: shared hit=538663
                                             Worker 0: actual time=0.059..1479.907 rows=2912875 loops=1
                                               Buffers: shared hit=107477
                                             Worker 1: actual time=0.100..1475.863 rows=2930306 loops=1
                                               Buffers: shared hit=107817
                                             Worker 2: actual time=0.054..1481.032 rows=2925849 loops=1
                                               Buffers: shared hit=107812
                                             Worker 3: actual time=0.058..1477.443 rows=2897544 loops=1
                                               Buffers: shared hit=107047
                                       ->  Index Scan using tbl2_pkey_102328 on public.tbl2_102328 tbl2_1  (cost=0.43..2.63 rows=1 width=25) (actual time=0.013..0.013 rows=0 loops=14612531)
                                             Output: tbl2_1.id
                                             Index Cond: (((tbl2_1.id)::text = (tbl1.id)::text) AND ((tbl2_1.client)::text = '123kkjk444kjkhj3ddd'::text))
                                             Filter: ((tbl2_1.property ->> 'num'::text) = ANY ('{"1","2","3","31","12a","45","78","99"}'::text[]))
                                             Rows Removed by Filter: 1
                                             Buffers: shared hit=249116472 dirtied=32
                                             Worker 0: actual time=0.013..0.013 rows=0 loops=2912875
                                               Buffers: shared hit=49460435 dirtied=5
                                             Worker 1: actual time=0.013..0.013 rows=0 loops=2930306
                                               Buffers: shared hit=49860223 dirtied=10
                                             Worker 2: actual time=0.013..0.013 rows=0 loops=2925849
                                               Buffers: shared hit=49883314 dirtied=2
                                             Worker 3: actual time=0.013..0.013 rows=0 loops=2897544
                                               Buffers: shared hit=49563177 dirtied=10
                                 ->  Index Scan using tbl3_unikey_104219 on public.tbl3_104219 tbl3  (cost=0.27..0.30 rows=1 width=52) (actual time=0.002..0.002 rows=0 loops=871)
                                       Output: tbl3.client, tbl3.originalname, tbl3.displayname
                                       Index Cond: (((tbl3.client)::text = '123kkjk444kjkhj3ddd'::text) AND ((tbl3.originalname)::text = (tbl1.name)::text))
                                       Buffers: shared hit=1746
                                       Worker 0: actual time=0.002..0.002 rows=0 loops=182
                                         Buffers: shared hit=365
                                       Worker 1: actual time=0.002..0.002 rows=0 loops=210
                                         Buffers: shared hit=421
                                       Worker 2: actual time=0.002..0.002 rows=0 loops=197
                                         Buffers: shared hit=395
                                       Worker 3: actual time=0.002..0.002 rows=0 loops=110
                                         Buffers: shared hit=221
             Planning Time: 0.361 ms
             Execution Time: 40056.008 ms
 Planning Time: 0.589 ms
 Execution Time: 40071.485 ms
(89 rows)

Time: 40072.986 ms (00:40.073)

是否可以进一步优化此查询以减少查询执行时间?预先感谢您的意见。

表定义如下:

                                               Table "public.tbl1"
      Column       |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
 client            | character varying(32)       |           | not null |         | extended |              | 
 sid               | character varying(32)       |           | not null |         | extended |              | 
 uid               | character varying(32)       |           |          |         | extended |              | 
  id               | character varying(32)       |           |          |         | extended |              | 
 tbl3number        | integer                     |           | not null |         | plain    |              | 
 name              | character varying(255)      |           |          |         | extended |              | 
 date_col          | timestamp without time zone |           |          |         | plain    |              | 
Indexes:
idx_sv_cuf8_110523(client,date_col desc,sid,tbl3number)

    
                                                             Table "public.tbl2"
          Column          |            Type             | Collation | Nullable |         Default         | Storage  | Stats target | Description 
--------------------------+-----------------------------+-----------+----------+-------------------------+----------+--------------+-------------
 id                       | character varying(32)       |           | not null |                         | extended |              | 
 uid                      | character varying(255)      |           |          | NULL::character varying | extended |              | 
 client                   | character varying(32)       |           | not null |                         | extended |              | 
 property                 | jsonb                       |           |          |                         | extended |              | 
 Indexes:
    "tbl2_pkey" PRIMARY KEY, btree (uid, client)


--
                                                Table "public.tbl3"
       Column       |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
--------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 client              | character varying(500) |           | not null |         | extended |              | 
 originalname        | character varying(500) |           |          |         | extended |              | 
 displayname         | character varying(500) |           |          |         | extended |              | 
Indexes:
    "tbl3_unikey" UNIQUE CONSTRAINT, btree (client, originalname)
4

3 回答 3

3

tl; dr:多列覆盖索引

查询清晰度

我偏爱使用严格的查询格式,因此更容易查看正在处理的列和表。出于同样的原因,我删除了您的 CTE 并将其条件移至主查询。我还删除了多个相同的客户端 ID 常量。这是我的重写。

SELECT tbl1.id,
       COALESCE(tbl3.displayname, tbl1.name) AS name,
       tbl1.tbl3number, 
       tbl3.originalname as orgtbl3
       FROM table_1 tbl1
 INNER JOIN tbl2
              ON tbl2.client = tbl1.client
             AND tbl2.uid = tbl1.uid
             AND (tbl2.property->>'num') IN ('1', '2', '3', '31', '12a', '45', '78', '99')
 LEFT JOIN tbl3 
             ON tbl3.client = tbl1.client 
            AND tbl3.originalname = tbl1.name
 WHERE tbl1.client = '123kkjk444kjkhj3ddd'
   AND tbl1.date_col BETWEEN '2021-08-01T05:32:40Z' AND '2021-08-29T05:32:40Z'
 ORDER BY tbl1.date_col DESC, tbl1.sid, tbl1.tbl3number
 LIMIT 50000;

ORDER BY ... 限制 ...

当您 ORDER BY then LIMIT 时,有时会强制服务器进行大量数据混洗:对结果集进行排序然后丢弃其中的一些。你能避免 ORDER BY 或 LIMIT,或两者都避免吗?

在由 DESC 排序的列的索引上使用 DESC 关键字也可能会有所帮助。

覆盖索引

这是一个很大的查询。但我相信多列覆盖索引的明智选择将有助于加快速度。

tbl您通过对 的持续比较client和范围扫描进行过滤date_col。然后使用uid并输出idnametbl3number。因此,此 BTREE 索引将允许仅索引范围扫描,这通常很快。(注意 DESC 上的关键字date_col。它试图帮助您的 ORDER BY 子句。)

CREATE INDEX CONCURRENTLY tbl1_name_num_lookup 
          ON tbl1         (client, date_col DESC)
                  INCLUDE (uid, id, name, tbl3number);

tbl2,您访问clientand uid,然后使用 jsonb 列property。所以这个索引可能会对你有所帮助。

CREATE INDEX CONCURRENTLY tbl2_name_num_lookup 
          ON tbl2          (client, uid)
                   INCLUDE (property);

从,您通过和tbl3访问它。你输出. 所以这个索引应该会有所帮助。clientoriginalnamedisplayname

CREATE INDEX CONCURRENTLY tbl3_name_num_lookup 
          ON tbl3         (client, originalname)
                  INCLUDE (displayname);

连接列类型不匹配

你加入ON tbl2.uid = tbl1.uid。但是这两列有不同的数据类型:character varying(32)intbl1和 255 in tbl2。当 ON 列具有相同的数据类型时,JOIN 操作会更快。考虑修复其中一个。

也是如此ON tbl3.originalname = tbl1.name

于 2021-09-15T12:14:51.487 回答
1

您有两个别名为 tbl2 的不同事物当然不会提高您计划的可读性。您的计划缩进过多,因此我们需要继续左右滚动才能看到它也无济于事。

为什么您的计划显示(tbl2_1.id)::text = (tbl1.id)::text而您的查询显示tbl2.uid = tbl1.uid?是不是有点匿名化了?

基本上所有时间都在 tbl1 和 tbl2 之间进行连接,所以这是您需要优化的地方。如果您消除与 tbl3 的连接,那将简化 EXPLAIN,从而使其更易于理解。

您正在访问 tbl2 1400 万次,但只获得 174 行。但是我们无法判断索引是否为每 1400 万个输入找到一行,但它会被过滤掉,或者它平均找到 0 行。也许颠倒该连接的顺序会更有效,您可以通过创建 index 来做到这一点on tbl2 (client, (property->>'num'),uid)。或者也许是“id”而不是“uid”,我真的不知道你真正的查询是什么。

于 2021-09-15T14:23:40.457 回答
-1

您的第一个查询使用 JSON 并在 JSON 结构内操作过滤器(限制)以查找数据:

tbl2.property->>'num'

WHERE 谓词的这一部分不是“可分析的”。因此,满足您需求的唯一方法是扫描表 tbl2 中的每一行,然后对每一行扫描 json 文本流以找到所需的值。

所以迭代是表的行基数和部分 JSON 之间的一种叉积。

没有办法优化这样的查询......

每次您将在查询时引入一个具有迭代功能的对象(在您的查询中为 JSON),在可以使用基于集合的算法(索引、并行性...)检索的数据集中,结果是扫描和扫描,并扫描...

实际上 JSON 不能被索引。PostgreSQL 不接受 JSON 索引或 XML 索引,这与能够在 XML 上创建专用索引的 DB2、Oracle 或 SQL Server 不同...

于 2021-09-15T11:49:19.453 回答