sql - 优化大表上的 SELECT count(*)

Question

PostgreSQL 14 上具有 64GB 内存和 20 个线程的大型表的基本计数。存储是 NVME 磁盘。

问题：

如何改进此选择计数查询的查询？我应该对 Postgres 配置进行哪些优化？
工人计划是4，但发射是0，这正常吗？

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM public.product;

Finalize Aggregate  (cost=2691545.69..2691545.70 rows=1 width=8) (actual time=330901.439..330902.951 rows=1 loops=1)
  Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
  I/O Timings: read=36692.273 write=6548.923
  ->  Gather  (cost=2691545.27..2691545.68 rows=4 width=8) (actual time=330901.342..330902.861 rows=1 loops=1)
        Workers Planned: 4
        Workers Launched: 0
        Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
        I/O Timings: read=36692.273 write=6548.923
        ->  Partial Aggregate  (cost=2690545.27..2690545.28 rows=1 width=8) (actual time=330898.747..330898.757 rows=1 loops=1)
              Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
              I/O Timings: read=36692.273 write=6548.923
              ->  Parallel Index Only Scan using points on products  (cost=0.57..2634234.99 rows=22524114 width=0) (actual time=0.361..222958.361 rows=90993600 loops=1)
                    Heap Fetches: 46261956
                    Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
                    I/O Timings: read=36692.273 write=6548.923
Planning:
  Buffers: shared hit=39 read=8
  I/O Timings: read=0.398
Planning Time: 2.561 ms
JIT:
  Functions: 4
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 0.691 ms, Inlining 104.789 ms, Optimization 24.169 ms, Emission 22.457 ms, Total 152.107 ms
Execution Time: 330999.777 ms

score 2 · Accepted Answer

工人计划是4，但发射是0，这正常吗？

当太多并发事务竞争有限数量的允许并行工作者时，可能会发生这种情况。手册：

计划者将考虑使用的后台工作人员的数量最多为max_parallel_workers_per_gather. 任何时候可以存在的后台工作人员总数受max_worker_processes和限制max_parallel_workers。因此，并行查询有可能在少于计划的工作人员的情况下运行，甚至根本没有工作人员。最佳计划可能取决于可用的工作人员数量，因此这可能会导致查询性能不佳。如果这种情况经常发生，请考虑增加以便可以max_worker_processes同时max_parallel_workers运行更多的工作人员，或者减少 max_parallel_workers_per_gather工作人员以便计划者请求更少的工作人员。

您还可以优化整体性能以释放资源，或获得更好的硬件（除了加速max_parallel_workers）。

还有什么令人不安的：

堆取数：46261956

对于 90993600 行。这对舒适来说太多了。仅索引扫描不应该执行那么多堆提取。

这两种症状都表明大量并发写入访问（或长时间运行的事务占用资源并autovacuum无法完成其工作）。对此进行调查，和/或调整每个表的autovacuum设置以使表product更具侵略性，以便列统计信息更有效并且可见性地图可以跟上。看：

PostgreSQL 上的主动自动清理

此外，对于有效的表统计信息，（非常快！）估计可能就足够了？看：

在 PostgreSQL 中发现表的行数的快速方法

sql - 优化大表上的 SELECT count(*)

1 回答 1

Related

Reference