我最近一直在玩 TimeScaleDB 但是我有点困惑,需要一些指针来说明为什么我的查询运行缓慢或验证这是否是 timescaledb 查询的典型性能。
我使用的数据集是一个特定日期的市场报价数据,我已将大约 8400 万条记录加载到我的超表中。
这是我文件中数据类型的示例:
2018-12-03 00:00:00.000344+00:00,2181.T,2018-12-03,2179,56300,2180,59500
2018-12-03 00:00:00.000629+00:00,1570.T,2018-12-03,20470,555118,20480,483857
2018-12-03 00:00:00.000631+00:00,2002.T,2018-12-03,2403,30300,2404,30200
我的表是这样创建的:
CREATE TABLE tt1 (time TIMESTAMPTZ NOT NULL,
cusip varchar(40) NOT NULL,
date DATE NULL,
value DOUBLE PRECISION,
value2 DOUBLE PRECISION,
value3 DOUBLE PRECISION,
value4 DOUBLE PRECISION);
我创建了两个版本的超表,tt1有 1 分钟的块,tt30m有 30 分钟的块。两个表都遵循上述相同的架构。我像这样创建了超表:
SELECT create_hypertable('tt1', 'time', chunk_time_interval => interval '1 minute');
time 和 cusip 列在两个版本的超表中都有索引。创建超表时默认为时间编制索引,并且我使用以下内容创建了 cusip 索引
CREATE INDEX ON tt1(cusip, time DESC);
我的查询如下所示:
EXPLAIN ANALYZE SELECT time_bucket('15 minutes', time) AS fifteen_min,
cusip, COUNT(*)
FROM tt1
WHERE time > timestamp '2018-12-03 05:10:06.174704-05' - interval '3 hours'
GROUP BY fifteen_min, cusip
ORDER BY fifteen_min DESC;
对于 30 分钟的块,查询需要 25.969 秒。这是它的查询计划:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------
Finalize GroupAggregate (cost=1679944.84..1685344.84 rows=40000 width=40) (actual time=25770.209..25873.410 rows=305849 loops=1)
Group Key: (time_bucket('00:15:00'::interval, tt30m."time")), tt30m.cusip
-> Gather Merge (cost=1679944.84..1684544.84 rows=40000 width=40) (actual time=25770.181..25885.080 rows=305849 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=1678944.83..1679044.83 rows=40000 width=40) (actual time=12880.868..12911.917 rows=152924 loops=2)
Sort Key: (time_bucket('00:15:00'::interval, tt30m."time")) DESC, tt30m.cusip
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: external merge Disk: 10976kB
-> Partial HashAggregate (cost=1675387.29..1675887.29 rows=40000 width=40) (actual time=12501.381..12536.373 rows=152924 loops=2)
Group Key: time_bucket('00:15:00'::interval, tt30m."time"), tt30m.cusip
-> Parallel Custom Scan (ChunkAppend) on tt30m (cost=10680.22..1416961.58 rows=34456761 width=32) (actual time=0.020..7293.929 rows=24255398
loops=2)
Chunks excluded during startup: 14
-> Parallel Seq Scan on _hyper_2_753_chunk (cost=0.00..116011.42 rows=4366426 width=17) (actual time=0.037..1502.121 rows=7423073 loop
s=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
-> Parallel Seq Scan on _hyper_2_755_chunk (cost=0.00..108809.26 rows=4095539 width=17) (actual time=0.017..1446.248 rows=6962556 loop
s=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
-> Parallel Seq Scan on _hyper_2_754_chunk (cost=0.00..107469.27 rows=4056341 width=17) (actual time=0.015..1325.638 rows=6895917 loop
s=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
-> Parallel Seq Scan on _hyper_2_756_chunk (cost=0.00..99037.70 rows=3730381 width=17) (actual time=0.006..1206.708 rows=6341775 loops
=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
-> Parallel Seq Scan on _hyper_2_758_chunk (cost=0.00..90757.67 rows=3421505 width=17) (actual time=0.017..1126.757 rows=5816675 loops
Time: 25968.520 ms (00:25.969)
对于 1 分钟的块,查询需要 25.686 秒。这是查询计划:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
---
Finalize GroupAggregate (cost=1048760.27..1054160.27 rows=40000 width=25) (actual time=25306.291..25409.778 rows=305849 loops=1)
Group Key: (time_bucket('00:15:00'::interval, tt1."time")), tt1.cusip
-> Gather Merge (cost=1048760.27..1053360.27 rows=40000 width=25) (actual time=25306.282..25424.711 rows=305849 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=1047760.26..1047860.26 rows=40000 width=25) (actual time=12629.859..12665.190 rows=152924 loops=2)
Sort Key: (time_bucket('00:15:00'::interval, tt1."time")) DESC, tt1.cusip
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: external merge Disk: 10976kB
-> Partial HashAggregate (cost=1044202.72..1044702.72 rows=40000 width=25) (actual time=12276.755..12311.071 rows=152924 loops=2)
Group Key: time_bucket('00:15:00'::interval, tt1."time"), tt1.cusip
-> Parallel Custom Scan (ChunkAppend) on tt1 (cost=0.42..830181.18 rows=28536205 width=17) (actual time=0.013..7147.401 rows=24255398 loops=
2)
Chunks excluded during startup: 430
-> Parallel Seq Scan on _hyper_1_564_chunk (cost=0.00..4776.72 rows=180218 width=17) (actual time=0.022..56.440 rows=306370 loops=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
-> Parallel Seq Scan on _hyper_1_571_chunk (cost=0.00..4632.82 rows=174066 width=16) (actual time=0.006..55.440 rows=295912 loops=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
-> Parallel Seq Scan on _hyper_1_553_chunk (cost=0.00..4598.08 rows=173526 width=17) (actual time=0.019..61.084 rows=294995 loops=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
-> Parallel Seq Scan on _hyper_1_499_chunk (cost=0.00..4586.53 rows=172922 width=17) (actual time=0.006..64.104 rows=293968 loops=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
-> Parallel Seq Scan on _hyper_1_498_chunk (cost=0.00..4528.29 rows=170504 width=17) (actual time=0.005..52.295 rows=289856 loops=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
-> Parallel Seq Scan on _hyper_1_502_chunk (cost=0.00..4509.36 rows=169949 width=17) (actual time=0.005..53.786 rows=288913 loops=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
-> Parallel Seq Scan on _hyper_1_645_chunk (cost=0.00..4469.19 rows=168735 width=17) (actual time=0.013..55.431 rows=286850 loops=1)
Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)
本质上,我正在寻找的是一些关于这是否是 timescaledb 的预期性能或者是否有优化此查询的方法的指针?
我已经运行了 timescaledb-tune 工具并接受了它建议的所有优化。我通过虚拟机在 linux vm 上运行它。vm 有 20gb 的 ram 和 250gb+ 硬盘空间和 2 个 cpu。Postgres 版本是 11.6,TimeScaleDB 版本是 1.5.0。此处附上dump_meta_data的输出:dump meta data output
非常感谢您的任何回复:)