我有 3 个表,我希望使用 Postgres 9.1 中的内部联接、读取、设备和 device_patients 来加入它们。下面是每个表的缩写模式。
读取 - ~250,000 行
CREATE TABLE reads
(
id serial NOT NULL,
device_id integer NOT NULL,
value bigint NOT NULL,
read_datetime timestamp without time zone NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT reads_pkey PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE reads
OWNER TO postgres;
CREATE INDEX index_reads_on_device_id
ON reads
USING btree
(device_id );
CREATE INDEX index_reads_on_read_datetime
ON reads
USING btree
(read_datetime );
设备——约 500 行
CREATE TABLE devices
(
id serial NOT NULL,
serial_number character varying(20) NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT devices_pkey PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE devices
OWNER TO postgres;
CREATE UNIQUE INDEX index_devices_on_serial_number
ON devices
USING btree
(serial_number COLLATE pg_catalog."default" );
患者设备——~25,000 行
CREATE TABLE patient_devices
(
id serial NOT NULL,
patient_id integer NOT NULL,
device_id integer NOT NULL,
issuance_datetime timestamp without time zone NOT NULL,
unassignment_datetime timestamp without time zone,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT patient_devices_pkey PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE patient_devices
OWNER TO postgres;
CREATE INDEX index_patient_devices_on_device_id
ON patient_devices
USING btree
(device_id );
CREATE INDEX index_patient_devices_on_issuance_datetime
ON patient_devices
USING btree
(issuance_datetime );
CREATE INDEX index_patient_devices_on_patient_id
ON patient_devices
USING btree
(patient_id );
CREATE INDEX index_patient_devices_on_unassignment_datetime
ON patient_devices
USING btree
(unassignment_datetime );
患者——约 1,000 行
CREATE TABLE patients
(
id serial NOT NULL,
first_name character varying(50) NOT NULL,
middle_name character varying(50),
last_name character varying(50) NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT participants_pkey PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE patients
OWNER TO postgres;
这是我的简短查询。
SELECT device_patients.patient_id, serial_number FROM reads
INNER JOIN devices ON devices.id = reads.device_id
INNER JOIN patient_devices ON device_patients.device_id = devices.id
WHERE (reads.read_datetime BETWEEN '2012-01-01 10:30:01.000000' AND '2013-05-18 03:03:42')
AND (read_datetime > issuance_datetime) AND ((unassignment_datetime IS NOT NULL AND read_datetime < unassignment_datetime) OR
(unassignment_datetime IS NULL))
GROUP BY serial_number, patient_devices.patient_id LIMIT 10
最终这将是更大查询的一小部分(没有限制,我只是添加限制来向自己证明长时间运行不是由于返回一堆行),但是我已经做了一堆实验和确定这是较大查询的较慢部分。当我在这个查询上运行 EXPLAIN ANALYZE 时,我得到以下输出(也可以在这里查看)
Limit (cost=156442.31..156442.41 rows=10 width=13) (actual time=2815.435..2815.441 rows=10 loops=1)
-> HashAggregate (cost=156442.31..159114.89 rows=267258 width=13) (actual time=2815.432..2815.437 rows=10 loops=1)
-> Hash Join (cost=1157.78..151455.79 rows=997304 width=13) (actual time=30.930..2739.164 rows=250150 loops=1)
Hash Cond: (devices.device_id = devices.id)
Join Filter: ((reads.read_datetime > patient_devices.issuance_datetime) AND (((patient_devices.unassignment_datetime IS NOT NULL) AND (reads.read_datetime < patient_devices.unassignment_datetime)) OR (patient_devices.unassignment_datetime IS NULL)))
-> Seq Scan on reads (cost=0.00..7236.94 rows=255396 width=12) (actual time=0.035..64.433 rows=255450 loops=1)
Filter: ((read_datetime >= '2012-01-01 10:30:01'::timestamp without time zone) AND (read_datetime <= '2013-05-18 03:03:42'::timestamp without time zone))
-> Hash (cost=900.78..900.78 rows=20560 width=37) (actual time=30.830..30.830 rows=25015 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 1755kB
-> Hash Join (cost=19.90..900.78 rows=20560 width=37) (actual time=0.776..20.551 rows=25015 loops=1)
Hash Cond: (patient_devices.device_id = devices.id)
-> Seq Scan on patient_devices (cost=0.00..581.93 rows=24893 width=24) (actual time=0.014..7.867 rows=25545 loops=1)
Filter: ((unassignment_datetime IS NOT NULL) OR (unassignment_datetime IS NULL))
-> Hash (cost=13.61..13.61 rows=503 width=13) (actual time=0.737..0.737 rows=503 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
-> Seq Scan on devices (cost=0.00..13.61 rows=503 width=13) (actual time=0.016..0.466 rows=503 loops=1)
Filter: (entity_id = 2)
Total runtime: 2820.392 ms
我的问题是如何加快速度?现在我正在我的 Windows 机器上运行它进行测试,但最终它将部署在 Ubuntu 上,这会有所不同吗?任何关于为什么这需要 2 秒的见解将不胜感激。
谢谢
有人建议 LIMIT 可能会改变查询计划。 这是没有 LIMIT 的相同查询。缓慢的部分似乎仍然是哈希连接。
此外,这里是相关的调整参数。同样,我现在只在 Windows 上进行测试,我不知道这会对 Linux 机器产生什么影响
shared_buffers = 2GB Effective_cache_size = 4GB work_mem = 256MB random_page_cost = 2.0
这是读取表的统计信息
Statistic Value
Sequential Scans 130
Sequential Tuples Read 28865850
Index Scans 283630
Index Tuples Fetched 141421907
Tuples Inserted 255450
Tuples Updated 0
Tuples Deleted 0
Tuples HOT Updated 0
Live Tuples 255450
Dead Tuples 0
Heap Blocks Read 20441
Heap Blocks Hit 3493033
Index Blocks Read 8824
Index Blocks Hit 4840210
Toast Blocks Read
Toast Blocks Hit
Toast Index Blocks Read
Toast Index Blocks Hit
Last Vacuum 2013-05-20 09:23:03.782-07
Last Autovacuum
Last Analyze 2013-05-20 09:23:03.91-07
Last Autoanalyze 2013-05-17 19:01:44.075-07
Vacuum counter 1
Autovacuum counter 0
Analyze counter 1
Autoanalyze counter 6
Table Size 27 MB
Toast Table Size none
Indexes Size 34 MB
以下是设备表的统计信息
Statistic Value
Sequential Scans 119
Sequential Tuples Read 63336
Index Scans 1053935
Index Tuples Fetched 1053693
Tuples Inserted 609
Tuples Updated 0
Tuples Deleted 0
Tuples HOT Updated 0
Live Tuples 609
Dead Tuples 0
Heap Blocks Read 32
Heap Blocks Hit 1054553
Index Blocks Read 32
Index Blocks Hit 2114305
Toast Blocks Read
Toast Blocks Hit
Toast Index Blocks Read
Toast Index Blocks Hit
Last Vacuum
Last Autovacuum
Last Analyze
Last Autoanalyze 2013-05-17 19:02:49.692-07
Vacuum counter 0
Autovacuum counter 0
Analyze counter 0
Autoanalyze counter 2
Table Size 48 kB
Toast Table Size none
Indexes Size 128 kB
以下是 patient_devices 表的统计信息
Statistic Value
Sequential Scans 137
Sequential Tuples Read 3065400
Index Scans 853990
Index Tuples Fetched 46143763
Tuples Inserted 25545
Tuples Updated 24936
Tuples Deleted 0
Tuples HOT Updated 0
Live Tuples 25547
Dead Tuples 929
Heap Blocks Read 1959
Heap Blocks Hit 6099617
Index Blocks Read 1077
Index Blocks Hit 2462681
Toast Blocks Read
Toast Blocks Hit
Toast Index Blocks Read
Toast Index Blocks Hit
Last Vacuum
Last Autovacuum 2013-05-17 19:01:44.576-07
Last Analyze
Last Autoanalyze 2013-05-17 19:01:44.697-07
Vacuum counter 0
Autovacuum counter 6
Analyze counter 0
Autoanalyze counter 6
Table Size 2624 kB
Toast Table Size none
Indexes Size 5312 kB
以下是我试图加快速度的完整查询。较小的查询确实更快,但我无法使我的完整查询更快,如下所示。按照建议,我添加了 4 个新索引,UNIQUE(device_id, issue_datetime), UNIQUE(device_id, issue_datetime), UNIQUE(patient_id, unassignment_datetime), UNIQUE(patient_id, unassignment_datetime)
SELECT
first_name
, last_name
, MAX(max_read) AS read_datetime
, SUM(value) AS value
, serial_number
FROM (
SELECT
pa.first_name
, pa.last_name
, value
, first_value(de.serial_number) OVER(PARTITION BY pa.id ORDER BY re.read_datetime DESC) AS serial_number -- I'm not sure if this is a good way to do this, but I don't know of another way
, re.read_datetime
, MAX(re.read_datetime) OVER (PARTITION BY pd.id) AS max_read
FROM reads re
INNER JOIN devices de ON de.id = re.device_id
INNER JOIN patient_devices pd ON pd.device_id = de.id
AND re.read_datetime >= pd.issuance_datetime
AND re.read_datetime < COALESCE(pd.unassignment_datetime , 'infinity'::timestamp)
INNER JOIN patients pa ON pa.id = pd.patient_id
WHERE re.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-05-18 03:03:42'
) AS foo WHERE read_datetime = max_read
GROUP BY first_name, last_name, serial_number ORDER BY value desc
LIMIT 10
很抱歉没有早点发布,但我认为这个查询太复杂了,并试图简单地解决问题,但显然我仍然无法弄清楚。如果我可以使用 max_read 变量限制嵌套选择返回的结果,似乎会快很多,但根据众多来源,这在 Postgres 中是不允许的。