postgresql - postgresql-12 分区表性能问题

Question

它工作正常插入更新删除副本等按计划工作但由于某种原因选择很慢。这是分区表：

/** TABLE PARTITIONING EVENT RECORD **/

-- CREATE PROPER SCHEMA
CREATE SCHEMA IF NOT EXISTS test_par_pool;

-- CREATE PROPER TABLE

CREATE TABLE test_part
(
    id                   bigserial                              not null
        constraint test_part_pkey
            primary key,
    device_id            bigint,
    device_type          bigint,
    record_time          timestamp,
    module_serial_number bigint,
    module_id            bigint,
    message_type         bigint,
    event_code           bigint,
    device_status        bytea,
    sequence_number      bigint,
    data_bytes           bigint,
    device_data          bytea,
    active               boolean,
    deleted              boolean,
    created_time         timestamp                default now() not null,
    created_on           timestamp with time zone default now() not null,
    updated_on           timestamp with time zone default now() not null
);


-- CREATE MINIMAL INDEXES
CREATE INDEX idx_device_id
    ON public.test_part USING brin
    (device_id)
    TABLESPACE pg_default;

CREATE INDEX idx_module_id
    ON public.test_part USING brin
    (module_id)
    TABLESPACE pg_default;

CREATE INDEX idx_er_created_time
    ON public.test_part (cast(created_time as DATE));

-- CREATE INSERT FUNCTIONS

CREATE OR REPLACE FUNCTION test_par_insert_function()
    RETURNS TRIGGER AS
$$
DECLARE
    partition_date    TEXT;
    partition         TEXT;
    start_of_month    TEXT;
    end_of_next_month TEXT;
    stmt              TEXT;
BEGIN
    partition_date := to_char(NEW.created_time, 'YYYY_MM');
    partition := TG_RELNAME || '_' || partition_date;
    start_of_month := to_char((NEW.created_time), 'YYYY-MM') || '-01';
    end_of_next_month := to_char((NEW.created_time + interval '1 month'), 'YYYY-MM') || '-01';
    IF NOT EXISTS(SELECT relname FROM pg_class WHERE relname = partition) THEN
        RAISE NOTICE 'A partition has been created %',partition;
        stmt = 'CREATE TABLE test_par_pool.' || partition || ' (check (date_trunc(''day'', created_time) >= '
                   || chr(39) || start_of_month || chr(39)
                   || ' AND date_trunc(''day'', created_time) < '
                   || chr(39) || end_of_next_month
                   || chr(39) || ' )) INHERITS ( public.' || TG_RELNAME ||
               ');';
        EXECUTE stmt;
    END IF;
    EXECUTE 'INSERT INTO test_par_pool.' || partition ||
            ' SELECT( public.' || TG_RELNAME || ' ' || quote_literal(NEW) || ').* RETURNING id;';
    RETURN NULL;

END
$$
    LANGUAGE plpgsql;

-- CREATE TRIGGER

CREATE TRIGGER insert_test_part_trigger
    BEFORE INSERT ON public.test_part
    FOR EACH ROW EXECUTE PROCEDURE public.test_par_insert_function();

简单的分区表，尝试分析select count(*)；

数据库中有大约 150 万条记录。查询需要这么长时间是否有正当理由？

Explain analyse select count(*) from public.test_part;

Finalize Aggregate  (cost=41076.07..41076.08 rows=1 width=8) (actual time=243.842..243.842 rows=1 loops=1)
  ->  Gather  (cost=41075.75..41076.06 rows=3 width=8) (actual time=243.477..267.547 rows=4 loops=1)
        Workers Planned: 3
        Workers Launched: 3
        ->  Partial Aggregate  (cost=40075.75..40075.76 rows=1 width=8) (actual time=165.999..165.999 rows=1 loops=4)
              ->  Parallel Append  (cost=0.00..38793.96 rows=512716 width=0) (actual time=0.025..130.111 rows=397354 loops=4)
                    ->  Parallel Seq Scan on test_part_2019_11  (cost=0.00..11934.27 rows=171427 width=0) (actual time=0.022..41.875 rows=132856 loops=4)
                    ->  Parallel Seq Scan on test_part_2019_10  (cost=0.00..10984.80 rows=157780 width=0) (actual time=0.018..56.238 rows=244560 loops=2)
                    ->  Parallel Seq Scan on test_part_2019_12  (cost=0.00..8505.66 rows=151466 width=0) (actual time=0.017..47.168 rows=181759 loops=2)
                    ->  Parallel Seq Scan on test_part_2019_09  (cost=0.00..4805.65 rows=85565 width=0) (actual time=0.009..36.941 rows=205356 loops=1)
                    ->  Parallel Seq Scan on test_part (cost=0.00..0.00 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
Planning Time: 0.179 ms
Execution Time: 267.587 ms

score 3 · Accepted Answer

分区表可以通过允许规划器在规划查询时消除分区来提高性能。

出于这个原因，您应该尝试始终在查询的 where 子句中包含您的分区键，以便它与尽可能少的分区重叠（理想情况下为 1）。

因为您正在执行不使用分区键作为过滤器的查询，所以 postgres 必须查询每个分区，这可能比仅使用单个大表还要慢。

我建议你阅读这个页面，因为它提供了很多关于 postgres 分区的好信息：https ://www.postgresql.org/docs/12/ddl-partitioning.html

postgresql - postgresql-12 分区表性能问题

1 回答 1

Related

Reference