我有一个巨大的数据库(当前大小约为 900GB,新数据仍然存在)Year_month
由currency
. 问题是当我尝试从整个分区中获取聚合时它会变慢。这是一份报告,因此会经常被查询。我要聚合的当前分区大小:7.829.230 行。每个子分区都是相似的。表架构(匿名):
CREATE TABLE aggregates_dates
(
id char(1) DEFAULT '' NOT NULL,
date TIMESTAMP(0) NOT NULL,
currency CHAR(3) NOT NULL,
field01 INTEGER NOT NULL,
field02 INTEGER NOT NULL,
field03 INTEGER NOT NULL,
field04 INTEGER NOT NULL,
field05 INTEGER NOT NULL,
field06 CHAR(2) NOT NULL,
field07 INTEGER DEFAULT 0 NOT NULL,
field08 INTEGER DEFAULT 0 NOT NULL,
field09 INTEGER DEFAULT 0 NOT NULL,
field10 INTEGER DEFAULT 0 NOT NULL,
field11 INTEGER DEFAULT 0 NOT NULL,
value01 INTEGER DEFAULT 0 NOT NULL,
value02 INTEGER DEFAULT 0 NOT NULL,
value03 INTEGER DEFAULT 0 NOT NULL,
value04 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value05 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value06 INTEGER DEFAULT 0 NOT NULL,
value07 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value08 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value09 INTEGER DEFAULT 0 NOT NULL,
value10 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value11 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value12 INTEGER DEFAULT 0 NOT NULL,
value13 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value14 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value15 INTEGER DEFAULT 0 NOT NULL,
value16 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value17 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value18 NUMERIC(24, 12) DEFAULT '0'::NUMERIC NOT NULL,
value19 INTEGER DEFAULT 0,
value20 INTEGER DEFAULT 0,
CONSTRAINT aggregates_dates_pkey
PRIMARY KEY (id, date, currency)
)
PARTITION BY RANGE (date);
CREATE TABLE aggregates_dates_2020_01
PARTITION OF aggregates_dates
FOR VALUES FROM ('2020-01-01 00:00:00') TO ('2020-01-31 23:59:59')
PARTITION BY LIST (currency);
CREATE TABLE aggregates_dates_2020_01_eur
PARTITION OF aggregates_dates_2020_01
FOR VALUES IN ('EUR');
CREATE INDEX aggregates_dates_2020_01_eur_date_idx ON aggregates_dates_2020_01_eur (date);
CREATE INDEX aggregates_dates_2020_01_eur_field01_idx ON aggregates_dates_2020_01_eur (field01);
CREATE INDEX aggregates_dates_2020_01_eur_field02_idx ON aggregates_dates_2020_01_eur (field02);
CREATE INDEX aggregates_dates_2020_01_eur_field03_idx ON aggregates_dates_2020_01_eur (field03);
CREATE INDEX aggregates_dates_2020_01_eur_field04_idx ON aggregates_dates_2020_01_eur (field04);
CREATE INDEX aggregates_dates_2020_01_eur_field06_idx ON aggregates_dates_2020_01_eur (field06);
CREATE INDEX aggregates_dates_2020_01_eur_currency_idx ON aggregates_dates_2020_01_eur (currency);
CREATE INDEX aggregates_dates_2020_01_eur_field09_idx ON aggregates_dates_2020_01_eur (field09);
CREATE INDEX aggregates_dates_2020_01_eur_field10_idx ON aggregates_dates_2020_01_eur (field10);
CREATE INDEX aggregates_dates_2020_01_eur_field11_idx ON aggregates_dates_2020_01_eur (field11);
CREATE INDEX aggregates_dates_2020_01_eur_field05_idx ON aggregates_dates_2020_01_eur (field05);
CREATE INDEX aggregates_dates_2020_01_eur_field07_idx ON aggregates_dates_2020_01_eur (field07);
CREATE INDEX aggregates_dates_2020_01_eur_field08_idx ON aggregates_dates_2020_01_eur (field08);
聚合整个分区的示例查询(并非使用所有字段)(此查询可能有更多 WHERE 条件,但这是最坏的情况)
EXPLAIN (ANALYSE, BUFFERS, VERBOSE) SELECT
COALESCE(SUM(mainTable.value01), 0) AS "value01",
COALESCE(SUM(mainTable.value02), 0) AS "value02",
COALESCE(SUM(mainTable.value03), 0) AS "value03",
COALESCE(SUM(mainTable.value06), 0) AS "value06",
COALESCE(SUM(mainTable.value09), 0) AS "value09",
COALESCE(SUM(mainTable.value12), 0) AS "value12",
COALESCE(SUM(mainTable.value15), 0) AS "value15",
COALESCE(SUM(mainTable.value03 + mainTable.value06 + mainTable.value09 + mainTable.value12 +
mainTable.value15), 0) AS "kpi01",
COALESCE(SUM(mainTable.value05) * 1, 0) "value05",
COALESCE(SUM(mainTable.value08) * 1, 0) "value08",
COALESCE(SUM(mainTable.value11) * 1, 0) "value11",
COALESCE(SUM(mainTable.value14) * 1, 0) "value14",
COALESCE(SUM(mainTable.value17) * 1, 0) "value17",
COALESCE(SUM(mainTable.value05 + mainTable.value08 + mainTable.value11 + mainTable.value14 +
mainTable.value17) * 1, 0) "kpi02",
CASE
WHEN SUM(mainTable.value02) > 0 THEN (1.0 * SUM(
mainTable.value05 + mainTable.value08 + mainTable.value11 +
mainTable.value14 + mainTable.value17) / SUM(mainTable.value02) * 1000 * 1)
ELSE 0 END "kpiEpm",
CASE
WHEN SUM(mainTable.value01) > 0 THEN (1.0 * SUM(
mainTable.value05 + mainTable.value08 + mainTable.value11 +
mainTable.value14) / SUM(mainTable.value01) * 1)
ELSE 0 END
FROM aggregates_dates mainTable
WHERE (mainTable.date BETWEEN '2020-01-01 00:00:00' AND '2020-02-01 00:00:00')
AND (mainTable.currency = 'EUR')
GROUP BY mainTable.field02;
解释:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|HashAggregate (cost=3748444.51..3748502.07 rows=794 width=324) (actual time=10339.771..10340.497 rows=438 loops=1) |
| Group Key: maintable.field02 |
| Batches: 1 Memory Usage: 1065kB |
| Buffers: shared hit=2445343 |
| -> Append (cost=0.00..2706608.65 rows=11575954 width=47) (actual time=212.934..4549.921 rows=7829230 loops=1) |
| Buffers: shared hit=2445343 |
| -> Seq Scan on aggregates_2020_01 maintable_1 (cost=0.00..2646928.38 rows=11570479 width=47) (actual time=212.933..4055.104 rows=7823923 loops=1) |
| Filter: ((date >= '2020-01-01 00:00:00'::timestamp without time zone) AND (date <= '2020-02-01 00:00:00'::timestamp without time zone) AND (currency = 'EUR'::bpchar))|
| Buffers: shared hit=2444445 |
| -> Index Scan using aggregates_2020_02_date_idx on aggregates_2020_02 maintable_2 (cost=0.56..1800.50 rows=5475 width=47) (actual time=0.036..6.476 rows=5307 loops=1) |
| Index Cond: ((date >= '2020-01-01 00:00:00'::timestamp without time zone) AND (date <= '2020-02-01 00:00:00'::timestamp without time zone)) |
| Filter: (currency = 'EUR'::bpchar) |
| Rows Removed by Filter: 31842 |
| Buffers: shared hit=898 |
|Planning Time: 0.740 ms |
|JIT: |
| Functions: 15 |
| Options: Inlining true, Optimization true, Expressions true, Deforming true |
| Timing: Generation 4.954 ms, Inlining 14.249 ms, Optimization 121.115 ms, Emission 77.181 ms, Total 217.498 ms |
|Execution Time: 10345.662 ms |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
服务器规格:
- AMD 64 线程
- 315GB 内存
- 6xSSD RAID 10 Postgres 配置:
postgresql_autovacuum_vacuum_scale_factor: 0.4
postgresql_checkpoint_completion_target: 0.9
postgresql_checkpoint_timeout: 10min
postgresql_effective_cache_size: 240GB
postgresql_maintenance_work_mem: 2GB
postgresql_random_page_cost: 1.0
postgresql_shared_buffers: 80GB
postgresql_synchronous_commit: local
postgresql_work_mem: 1GB
[2021-04-27 更新]
我已经更新了服务器配置:
postgresql_max_worker_processes: 64
postgresql_max_parallel_workers_per_gather: 32
postgresql_max_parallel_workers: 64
postgresql_max_parallel_maintenance_workers: 4
对于整个查询,我有一个关于生产数据的示例(更长 - 所有表字段的聚合)不能更快地工作并且不使用并行(大选择语句?)。但是当我减少 SELECT 上的聚合数量时,它开始使用并行并提高了战利品性能。但是当我将查询恢复为原始查询时,它不使用并行。