警告:我不会删除此答案,因为它在技术上似乎是正确的,因此可能会有所帮助,但请注意这可能PARTITION BY bar ORDER BY foo
不是您想要做的。实际上,聚合函数不会将分区元素作为一个整体进行计算。也就是说,SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
不等于(SELECT avg(foo) OVER (PARTITION BY bar)
见答案末尾的证明)。
虽然它本身并没有提高性能,但如果您多次使用同一个分区,您可能希望使用 asstander 提出的第二种语法,这不仅是因为它更便宜。这就是为什么。
考虑以下查询:
SELECT
array_agg(foo)
OVER (PARTITION BY bar ORDER BY foo),
avg(baz)
OVER (PARTITION BY bar ORDER BY foo)
FROM
foobar;
由于原则上排序对平均值的计算没有影响,您可能会想改用以下查询(在第二个分区上没有排序):
SELECT
array_agg(foo)
OVER (PARTITION BY bar ORDER BY foo),
avg(baz)
OVER (PARTITION BY bar)
FROM
foobar;
这是一个很大的错误,因为它需要更长的时间。证明 :
> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
-> Sort (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
Sort Key: bar, foo
Sort Method: quicksort Memory: 130006kB
-> Seq Scan on foobar (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
Total runtime: 2458.969 ms
(6 lignes)
> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
-> WindowAgg (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
-> Sort (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
Sort Key: bar, foo
Sort Method: quicksort Memory: 130006kB
-> Seq Scan on foobar (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
Total runtime: 3060.041 ms
(7 lignes)
现在,如果您知道这个问题,当然您将在任何地方使用相同的分区。但是当你有十次或更多相同的分区并且你在几天内更新它时,很容易忘记在ORDER BY
不需要它的分区上添加子句。
语法来了WINDOW
,它可以防止你犯这种粗心的错误(当然,前提是你知道最好尽量减少不同窗口函数的数量)。EXPLAIN ANALYZE
以下内容与第一个查询严格等效(据我所知):
SELECT
array_agg(foo)
OVER qux,
avg(baz)
OVER qux
FROM
foobar
WINDOW
qux AS (PARTITION BY bar ORDER BY bar)
预警更新:
我理解“SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
不等于”的说法SELECT avg(foo) OVER (PARTITION BY bar)
似乎有问题,所以这里有一个例子:
# SELECT * FROM foobar;
foo | bar
-----+-----
1 | 1
2 | 2
3 | 1
4 | 2
(4 lines)
# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
array_agg | avg
-----------+-----
{1,3} | 2
{1,3} | 2
{2,4} | 3
{2,4} | 3
(4 lines)
# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
array_agg | avg
-----------+-----
{1} | 1
{1,3} | 2
{2} | 2
{2,4} | 3
(4 lines)