我有一个带有三列表(大约 150 000 行)的 postgres 数据库;row_id、位置和计数的花朵数。由于位置彼此相当接近,因此计数可能存在一些错误,我想知道每个位置的平均计数,所以我写了以下 sql 问题:
WITH buffered_zones
AS (SELECT row_id,
St_buffer(pos, 0.00001) AS buffers
FROM flower_tbl
limit 100),
mean_vals
AS (SELECT ft.row_id,
pos,
(SELECT Avg(ft.counted_flowers)
FROM buffered_zones bz
WHERE St_intersects(ft.pos, buffers)
AND bz.row_id = ft.row_id) AS counted_flowers
FROM flower_tbl ft
GROUP BY ft.row_id,
pos)
SELECT *
FROM mean_vals
哪个有效,并给我前 100 分的平均计数,大约需要 15 秒。当我将限制增加到 1000 时,运行时间增加到大约 1.5 分钟,依此类推。因此,我想在不到 30 秒的时间内为所有 150 000 行运行问题。有什么建议么?row_id 上有一个 btree 索引,pos 上有一个 gist_index。
我添加了查询计划:
CTE Scan on mean_vals (cost=4219889.10..4222939.24 rows=152507 width=44) (actual time=214.941..8703.535 rows=152507 loops=1)
CTE buffered_zones
-> Limit (cost=0.00..28.40 rows=100 width=36) (actual time=0.957..23.034 rows=100 loops=1)
-> Seq Scan on flower_table (cost=0.00..43311.82 rows=152507 width=36) (actual time=0.955..23.003 rows=100 loops=1)
CTE mean_vals
-> GroupAggregate (cost=22486.79..4219860.70 rows=152507 width=40) (actual time=214.938..8583.157 rows=152507 loops=1)
Group Key: ft.row_id, ft.pos
-> Sort (cost=22486.79..22868.06 rows=152507 width=40) (actual time=191.499..249.276 rows=152507 loops=1)
Sort Key: ft.row_id, ft.pos
Sort Method: external sort Disk: 7448kB
-> Seq Scan on flower_table ft (cost=0.00..5185.07 rows=152507 width=40) (actual time=0.014..66.537 rows=152507 loops=1)
SubPlan 2
-> CTE Scan on buffered_zones bz (cost=0.00..27.50 rows=1 width=0) (actual time=0.053..0.053 rows=0 loops=152507)
Filter: ((ft.pos && buffers) AND (row_id = ft.row_id) AND _st_intersects(ft.pos, buffers))
Rows Removed by Filter: 100
Planning time: 1.868 ms
Execution time: 8726.110 ms