postgresql - 在 SQL 查询中执行一次性计算

Question

我有这个查询（为简单起见进行了编辑）：

select to_timestamp(s.sampletimestamp/1000)
from sample s
where s.sampletimestamp >= extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and
s.sampletimestamp < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000
order by s.sampletimestamp;

我注意到通过手动输入时间值可以更快地执行：

select to_timestamp(s.sampletimestamp/1000)
from sample s
where s.sampletimestamp >= 1376143200000 and
s.sampletimestamp < 1376229600000
order by s.sampletimestamp;

其中时间是以毫秒为单位的纪元时间戳。我的猜测是计算机正在评估extract(EPOCH...)每条记录的部分，而它实际上只需要这样做一次。

有没有办法保持第一个查询的更易读的形式，同时保持查询与第二个查询一样高效？

我是 PostgreSQL 的新手（并且完全是自学的），所以我认为我最常遇到的问题是不知道我应该将一个特定的关键字放入 google - 我已经使用过这个关键字以及 PostgreSQL 文档。

提前致谢：）

EDIT1：感谢非常详细的答复。我怀疑我可能与大多数受访者处于不同的时区——我明天将为此提供实验证据（这里已经很晚了）。

EDIT2：总结下面的答案，使用“bigint”进行投射就可以了。代替：

where s.sampletimestamp >= extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and
s.sampletimestamp < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000

和：

where s.sampletimestamp >= extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')::bigint*1000 and
s.sampletimestamp < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')::bigint*1000

score 2 · Accepted Answer

这里发生的extract是使用date_part函数实现的：

regress=> explain select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000;
                                                                                                                                        QUERY PLAN                                                                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=30.02..30.03 rows=1 width=0)
   ->  Function Scan on generate_series x  (cost=0.00..30.00 rows=5 width=0)
         Filter: (((x)::double precision > (date_part('epoch'::text, '2013-08-10 22:00:00+08'::timestamp with time zone) * 1000::double precision)) AND ((x)::double precision < (date_part('epoch'::text, '2013-08-11 22:00:00+08'::timestamp with time zone) * 1000::double precision)))
(3 rows)

date_part(text, timestamptz)被定义stable为非immutable：

regress=> \df+ date_part
                                                                                                                 List of functions
   Schema   |   Name    | Result data type |        Argument data types        |  Type  | Volatility |  Owner   | Language |                               Source code                                |                 Description                 
------------+-----------+------------------+-----------------------------------+--------+------------+----------+----------+--------------------------------------------------------------------------+---------------------------------------------
 ...
 pg_catalog | date_part | double precision | text, timestamp with time zone    | normal | stable     | postgres | internal | timestamptz_part                                                         | extract field from timestamp with time zone
 ...

我很确定这会阻止 Pg 预先计算值并将其内联到调用中。我需要深入挖掘才能确定。

我相信推理是date_parton atimestamptz可以依赖于设置的值TimeZone。这不是真的，date_part('epoch', some_timestamptz)但查询规划器在规划时并不了解您正在使用它。

正如文档所述，我仍然对它没有预先计算感到惊讶：

STABLE函数不能修改数据库，并且保证在给定单个语句中所有行的相同参数的情况下返回相同的结果。此类别允许优化器将函数的多次调用优化为单个调用。

您可以先将AT TIME ZONE 'UTC'. 例如：

select count(1) 
from generate_series(1376143200000,1376143200000+1000000) x 
where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')*1000 
and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')*1000;

这执行得更快，尽管如果只计算一次，时间差比我预期的要多：

regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000;
  count  
---------
 1000000
(1 row)

Time: 767.629 ms

regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')*1000 and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')*1000;
  count  
---------
 1000000
(1 row)

Time: 373.453 ms

regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > 1376143200000 and x <  1376229600000;
  count  
---------
 1000000
(1 row)

Time: 324.557 ms

可以删除此查询优化器限制/添加一个功能来优化它。优化器可能需要在解析时识别这extract('epoch', ...)是一种特殊情况，而不是调用不可变date_part('epoch, ...)的特殊timestamptz_epoch(...)函数。

稍微看一下perf top结果表明，timestamptz 案例具有以下峰值：

 10.33%  postgres      [.] ExecMakeFunctionResultNoSets
  7.76%  postgres      [.] timesub.isra.1
  6.94%  postgres      [.] datebsearch
  5.58%  postgres      [.] timestamptz_part
  3.82%  postgres      [.] AllocSetAlloc
  2.97%  postgres      [.] ExecEvalConst
  2.68%  postgres      [.] downcase_truncate_identifier
  2.38%  postgres      [.] ExecEvalScalarVarFast
  2.23%  postgres      [.] slot_getattr
  1.99%  postgres      [.] DatumGetFloat8

而使用AT TIME ZONE我们得到：

 11.58%  postgres      [.] ExecMakeFunctionResultNoSets
  4.28%  postgres      [.] AllocSetAlloc
  4.18%  postgres      [.] ExecProject
  3.82%  postgres      [.] slot_getattr
  2.99%  libc-2.17.so  [.] __memmove_ssse3
  2.96%  postgres      [.] BufFileWrite
  2.80%  libc-2.17.so  [.] __memcpy_ssse3_back
  2.74%  postgres      [.] BufFileRead
  2.69%  postgres      [.] float8lt

并使用整数情况：

  7.92%  postgres      [.] ExecMakeFunctionResultNoSets
  5.36%  postgres      [.] slot_getattr
  4.52%  postgres      [.] AllocSetAlloc
  4.02%  postgres      [.] ExecProject
  3.42%  libc-2.17.so  [.] __memmove_ssse3
  3.33%  postgres      [.] BufFileWrite
  3.31%  libc-2.17.so  [.] __memcpy_ssse3_back
  2.91%  postgres      [.] BufFileRead
  2.90%  postgres      [.] GetMemoryChunkSpace
  2.67%  postgres      [.] AllocSetFree

所以你可以看到该AT TIME ZONE版本避免了重复timestamptz_part和datebsearch调用。它与整数大小写的主要区别是float8lt；看起来我们正在做double precision比较而不是整数比较。

果然，演员会照顾它：

select count(1) 
from generate_series(1376143200000,1376143200000+1000000) x
where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')::bigint * 1000  
and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')::bigint * 1000;

目前我没有时间对上面讨论的优化器进行增强，但您可能需要考虑在邮件列表中提出。

postgresql - 在 SQL 查询中执行一次性计算

1 回答 1

Related

Reference