1

我有时间序列数据,我正在尝试尽可能高效地进行数据库结构和查询。

我已将 id 和 datetime 索引为表中的 desc。

SELECT
   table.id,
   To_char(Time_bucket('2 hours', datetime) at time zone 'utc', 'YYYY-MM-DD"T"HH24:MI:SS"Z"') AS time,
   Avg(value) AS value,
   mapping.description 
FROM
   table 
   JOIN
      mapping 
      ON table.id = mapping.id 
WHERE
   table.id IN
   (
      10000,
      10004,
      1001,
      10005
   )
   AND datetime BETWEEN '2019-09-25' AND '2019-09-30' 
GROUP BY
   time,
   table.id,
   mapping.description 
ORDER BY
   time DESC;

表结构如下

                        Table "public.table"
  Column  |            Type             | Collation | Nullable | Default
----------+-----------------------------+-----------+----------+---------
 datetime | timestamp without time zone |           | not null |
 id       | integer                     |           | not null |
 value    | double precision            |           |          |
Indexes:
    "table_datetime_idx" btree (datetime DESC)
    "table_id_datetime_idx" btree (id, datetime DESC)

映射表

                      Table "public.mapping"
   Column    |       Type        | Collation | Nullable | Default
-------------+-------------------+-----------+----------+---------
 id          | integer           |           | not null |
 tagname     | character varying |           |          |
 description | character varying |           |          |
 unit        | character varying |           |          |
 mineu       | double precision  |           |          |
 maxeu       | double precision  |           |          |

Indexes:
     "mapping_id_idx" btree (id)

没有错误,但我仍然想知道这看起来不够好或不够高效。现在执行大约需要 14 秒。优化此查询的最简单解决方案是什么?

在 EXPLAIN ANALYZE 的结果下方

 GroupAggregate  (cost=250964.79..265699.28 rows=453369 width=73) (actual time=10247.641..11501.894 rows=60 loops=1)
   Group Key: (to_char(timezone('utc'::text, time_bucket('02:00:00'::interval, _hyper_1_4_chunk.datetime)), 'YYYY-MM-DD"T"HH24:MI:SS"Z"'::text)), _hyper_1_4_chunk.id, mapping.description
   ->  Sort  (cost=250964.79..252098.21 rows=453369 width=73) (actual time=10237.177..10481.057 rows=421712 loops=1)
         Sort Key: (to_char(timezone('utc'::text, time_bucket('02:00:00'::interval, _hyper_1_4_chunk.datetime)), 'YYYY-MM-DD"T"HH24:MI:SS"Z"'::text)) DESC, _hyper_1_4_chunk.id, mapping.description
         Sort Method: external merge  Disk: 33816kB
         ->  Hash Join  (cost=7228.67..196570.23 rows=453369 width=73) (actual time=81.488..5779.432 rows=421712 loops=1)
               Hash Cond: (_hyper_1_4_chunk.id = mapping.id)
               ->  Append  (cost=7215.89..186363.19 rows=452059 width=20) (actual time=81.299..3680.949 rows=421712 loops=1)
                     ->  Bitmap Heap Scan on _hyper_1_4_chunk  (cost=7215.89..129006.87 rows=363549 width=20) (actual time=81.298..3350.870 rows=336860 loops=1)
                           Recheck Cond: ((id = ANY ('{10000,10004,1001,10005}'::integer[])) AND (datetime >= '2019-09-25 00:00:00'::timestamp without time zone) AND (datetime <= '2019-09-30 00:00:00'::timestamp without time zone))
                           Heap Blocks: exact=61125
                           ->  Bitmap Index Scan on _hyper_1_4_chunk_table_id_datetime_idx  (cost=0.00..7125.00 rows=363549 width=0) (actual time=69.006..69.006 rows=336860 loops=1)
                                 Index Cond: ((id = ANY ('{10000,10004,1001,10005}'::integer[])) AND (datetime >= '2019-09-25 00:00:00'::timestamp without time zone) AND (datetime <= '2019-09-30 00:00:00'::timestamp without time zone))
                     ->  Bitmap Heap Scan on _hyper_1_3_chunk  (cost=1766.52..57356.32 rows=88510 width=20) (actual time=20.876..311.867 rows=84852 loops=1)
                           Recheck Cond: ((id = ANY ('{10000,10004,1001,10005}'::integer[])) AND (datetime >= '2019-09-25 00:00:00'::timestamp without time zone) AND (datetime <= '2019-09-30 00:00:00'::timestamp without time zone))
                           Heap Blocks: exact=16352
                           ->  Bitmap Index Scan on _hyper_1_3_chunk_table_id_datetime_idx  (cost=0.00..1744.39 rows=88510 width=0) (actual time=17.291..17.291 rows=84852 loops=1)
                                 Index Cond: ((id = ANY ('{10000,10004,1001,10005}'::integer[])) AND (datetime >= '2019-09-25 00:00:00'::timestamp without time zone) AND (datetime <= '2019-09-30 00:00:00'::timestamp without time zone))
               ->  Hash  (cost=8.46..8.46 rows=346 width=33) (actual time=0.163..0.163 rows=346 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 31kB
                     ->  Seq Scan on mapping  (cost=0.00..8.46 rows=346 width=33) (actual time=0.019..0.097 rows=346 loops=1)
 Planning time: 1.008 ms
 Execution time: 11507.606 ms
4

1 回答 1

1

如果你提高work_mem到 100 MB 或更多,排序应该在内存中计算,这将加快执行速度。

如果你提高work_mem更多,你可能会得到一个更快的哈希聚合而不是组聚合,这将使查询更快。

我认为您对索引扫描无能为力。

于 2019-10-14T06:06:31.603 回答