1

我最近一直在玩 TimeScaleDB 但是我有点困惑,需要一些指针来说明为什么我的查询运行缓慢或验证这是否是 timescaledb 查询的典型性能。

我使用的数据集是一个特定日期的市场报价数据,我已将大约 8400 万条记录加载到我的超表中。

这是我文件中数据类型的示例:

2018-12-03 00:00:00.000344+00:00,2181.T,2018-12-03,2179,56300,2180,59500

2018-12-03 00:00:00.000629+00:00,1570.T,2018-12-03,20470,555118,20480,483857

2018-12-03 00:00:00.000631+00:00,2002.T,2018-12-03,2403,30300,2404,30200

我的表是这样创建的:


CREATE TABLE tt1 (time        TIMESTAMPTZ           NOT NULL,
cusip       varchar(40)           NOT NULL,     
date        DATE                NULL,  
value       DOUBLE PRECISION,
value2      DOUBLE PRECISION,
value3      DOUBLE PRECISION,
value4      DOUBLE PRECISION);

我创建了两个版本的超表,tt1有 1 分钟的块,tt30m有 30 分钟的块。两个表都遵循上述相同的架构。我像这样创建了超表:

SELECT create_hypertable('tt1', 'time', chunk_time_interval => interval '1 minute');

time 和 cusip 列在两个版本的超表中都有索引。创建超表时默认为时间编制索引,并且我使用以下内容创建了 cusip 索引

  CREATE INDEX ON tt1(cusip, time DESC);

我的查询如下所示:

EXPLAIN ANALYZE SELECT time_bucket('15 minutes', time) AS fifteen_min,
  cusip, COUNT(*)
  FROM tt1
  WHERE time > timestamp '2018-12-03 05:10:06.174704-05' - interval '3 hours'
  GROUP BY fifteen_min, cusip
  ORDER BY fifteen_min DESC;

对于 30 分钟的块,查询需要 25.969 秒。这是它的查询计划:

                                                                                 QUERY PLAN                                                                        



-------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------

 Finalize GroupAggregate  (cost=1679944.84..1685344.84 rows=40000 width=40) (actual time=25770.209..25873.410 rows=305849 loops=1)

   Group Key: (time_bucket('00:15:00'::interval, tt30m."time")), tt30m.cusip

   ->  Gather Merge  (cost=1679944.84..1684544.84 rows=40000 width=40) (actual time=25770.181..25885.080 rows=305849 loops=1)

         Workers Planned: 1

         Workers Launched: 1

         ->  Sort  (cost=1678944.83..1679044.83 rows=40000 width=40) (actual time=12880.868..12911.917 rows=152924 loops=2)

               Sort Key: (time_bucket('00:15:00'::interval, tt30m."time")) DESC, tt30m.cusip

               Sort Method: quicksort  Memory: 25kB

               Worker 0:  Sort Method: external merge  Disk: 10976kB

               ->  Partial HashAggregate  (cost=1675387.29..1675887.29 rows=40000 width=40) (actual time=12501.381..12536.373 rows=152924 loops=2)

                     Group Key: time_bucket('00:15:00'::interval, tt30m."time"), tt30m.cusip

                     ->  Parallel Custom Scan (ChunkAppend) on tt30m  (cost=10680.22..1416961.58 rows=34456761 width=32) (actual time=0.020..7293.929 rows=24255398

 loops=2)

                           Chunks excluded during startup: 14

                           ->  Parallel Seq Scan on _hyper_2_753_chunk  (cost=0.00..116011.42 rows=4366426 width=17) (actual time=0.037..1502.121 rows=7423073 loop

s=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_755_chunk  (cost=0.00..108809.26 rows=4095539 width=17) (actual time=0.017..1446.248 rows=6962556 loop

s=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_754_chunk  (cost=0.00..107469.27 rows=4056341 width=17) (actual time=0.015..1325.638 rows=6895917 loop

s=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_756_chunk  (cost=0.00..99037.70 rows=3730381 width=17) (actual time=0.006..1206.708 rows=6341775 loops

=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_758_chunk  (cost=0.00..90757.67 rows=3421505 width=17) (actual time=0.017..1126.757 rows=5816675 loops

Time: 25968.520 ms (00:25.969)

对于 1 分钟的块,查询需要 25.686 秒。这是查询计划:

                                                                              QUERY PLAN                                                                           



-------------------------------------------------------------------------------------------------------------------------------------------------------------------

---

 Finalize GroupAggregate  (cost=1048760.27..1054160.27 rows=40000 width=25) (actual time=25306.291..25409.778 rows=305849 loops=1)

   Group Key: (time_bucket('00:15:00'::interval, tt1."time")), tt1.cusip

   ->  Gather Merge  (cost=1048760.27..1053360.27 rows=40000 width=25) (actual time=25306.282..25424.711 rows=305849 loops=1)

         Workers Planned: 1

         Workers Launched: 1

         ->  Sort  (cost=1047760.26..1047860.26 rows=40000 width=25) (actual time=12629.859..12665.190 rows=152924 loops=2)

               Sort Key: (time_bucket('00:15:00'::interval, tt1."time")) DESC, tt1.cusip

               Sort Method: quicksort  Memory: 25kB

               Worker 0:  Sort Method: external merge  Disk: 10976kB

               ->  Partial HashAggregate  (cost=1044202.72..1044702.72 rows=40000 width=25) (actual time=12276.755..12311.071 rows=152924 loops=2)

                     Group Key: time_bucket('00:15:00'::interval, tt1."time"), tt1.cusip

                     ->  Parallel Custom Scan (ChunkAppend) on tt1  (cost=0.42..830181.18 rows=28536205 width=17) (actual time=0.013..7147.401 rows=24255398 loops=

2)

                           Chunks excluded during startup: 430

                           ->  Parallel Seq Scan on _hyper_1_564_chunk  (cost=0.00..4776.72 rows=180218 width=17) (actual time=0.022..56.440 rows=306370 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_571_chunk  (cost=0.00..4632.82 rows=174066 width=16) (actual time=0.006..55.440 rows=295912 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_553_chunk  (cost=0.00..4598.08 rows=173526 width=17) (actual time=0.019..61.084 rows=294995 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_499_chunk  (cost=0.00..4586.53 rows=172922 width=17) (actual time=0.006..64.104 rows=293968 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_498_chunk  (cost=0.00..4528.29 rows=170504 width=17) (actual time=0.005..52.295 rows=289856 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_502_chunk  (cost=0.00..4509.36 rows=169949 width=17) (actual time=0.005..53.786 rows=288913 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_645_chunk  (cost=0.00..4469.19 rows=168735 width=17) (actual time=0.013..55.431 rows=286850 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

本质上,我正在寻找的是一些关于这是否是 timescaledb 的预期性能或者是否有优化此查询的方法的指针?

我已经运行了 timescaledb-tune 工具并接受了它建议的所有优化。我通过虚拟机在 linux vm 上运行它。vm 有 20gb 的 ram 和 250gb+ 硬盘空间和 2 个 cpu。Postgres 版本是 11.6,TimeScaleDB 版本是 1.5.0。此处附上dump_meta_data的输出:dump meta data output

非常感谢您的任何回复:)

4

1 回答 1

1

这个查询看起来需要在任何一种情况下扫描 3 小时内的所有记录,这就是需要时间的原因,有一些选项可以加快这种事情的速度,一个是虚拟硬件这里可能会放慢速度,因为它需要相当多的 io 并且您的盒子相当小,并且对于 IO 可能会放慢很多速度,因此在这里更大的盒子会有所帮助。改变块大小几乎没有影响,块大小几乎不会影响这种查询,事实上我建议使用更大的块,因为 84m 行并不是那么多。另一种选择是使用连续聚合来为您预先计算一些该操作,如果这是您将要运行的查询类型,这可以为您节省一些时间和 cpu/memory/io 问题。

于 2019-12-05T16:24:37.203 回答