This question is primarily about older versions of PrestoSQL, which have been resolved in the (now renamed) Trino project as of versions 346. However, Amazon's Athena project is based off of Presto versions 0.217 (Athena Engine 2) and 0.172 (Athena Engine 1), which does have the issues described below. This question was written specifically around Athena Engine 1 / PrestoSQL version 0.172
Questions (tl;dr)
- What is the difference between
ROWS BETWEEN
andRANGE BETWEEN
in Presto window Functions?- Are these just synonyms for each other, or are there core conceptual differences?
- If they are just synonyms, why does
ROWS BETWEEN
allow more options thanRANGE BETWEEN
?
- Is there a query scenario where it's possible to use the exact same parameters on
ROWS BETWEEN
andRANGE BETWEEN
and get different results?- If using just
unbounded
/current row
, is there a scenario where you'd useRANGE
instead ofROWS
(or vice-versa)?
- If using just
- Since
ROWS
has more options, why isn't it mentioned at all in the documentation? o_O
Comments
The presto documentation is fairly quiet about even RANGE
, and doesn't mention ROWS
. I haven't found many discussions or examples around window functions in Presto. I'm starting to set through the Presto code-base to try to figure this out. Hopefully someone can save me from that, and we can improve the documentation together.
The Presto code has a parser and test cases for the ROWS
variant, but there's no mention in the documentation of ROWS
.
The test cases I found with both ROWS
and RANGE
don't test anything different between the two syntaxes.
They almost look like synonyms, but they do behave differently in my testing, and have different allowed parameters and validation rules.
The following examples can be run with the starburstdata/presto Docker image running Presto 0.213-e-0.1. Typically I run Presto 0.172 through Amazon Athena, and have almost always ended up using ROWS
.
RANGE
RANGE seems to be limited to "UNBOUNDED" and "CURRENT ROW". The following returns an error:
range between 1 preceding and 1 following
use tpch.tiny;
select custkey, orderdate,
array_agg(orderdate) over (
partition by custkey
order by orderdate asc
range between 1 preceding and 1 following
) previous_orders
from orders where custkey in (419, 320) and orderdate < date('1996-01-01')
order by custkey, orderdate asc;
ERROR:
Window frame RANGE PRECEDING is only supported with UNBOUNDED
The following range syntaxes do work fine (with expected differing results). All following examples based on the above query, just changing the range
range between unbounded preceding and current row
custkey | orderdate | previous_orders
---------+------------+--------------------------------------------------------------------------
320 | 1992-07-10 | [1992-07-10]
320 | 1992-07-30 | [1992-07-10, 1992-07-30]
320 | 1994-07-08 | [1992-07-10, 1992-07-30, 1994-07-08]
320 | 1994-08-04 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04]
320 | 1994-09-18 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04, 1994-09-18]
320 | 1994-10-12 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04, 1994-09-18, 1994-10-12]
419 | 1992-03-16 | [1992-03-16]
419 | 1993-12-29 | [1992-03-16, 1993-12-29]
419 | 1995-01-30 | [1992-03-16, 1993-12-29, 1995-01-30]
range between current row and unbounded following
custkey | orderdate | previous_orders
---------+------------+--------------------------------------------------------------------------
320 | 1992-07-10 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04, 1994-09-18, 1994-10-12]
320 | 1992-07-30 | [1992-07-30, 1994-07-08, 1994-08-04, 1994-09-18, 1994-10-12]
320 | 1994-07-08 | [1994-07-08, 1994-08-04, 1994-09-18, 1994-10-12]
320 | 1994-08-04 | [1994-08-04, 1994-09-18, 1994-10-12]
320 | 1994-09-18 | [1994-09-18, 1994-10-12]
320 | 1994-10-12 | [1994-10-12]
419 | 1992-03-16 | [1992-03-16, 1993-12-29, 1995-01-30]
419 | 1993-12-29 | [1993-12-29, 1995-01-30]
419 | 1995-01-30 | [1995-01-30]
range between unbounded preceding and unbounded following
custkey | orderdate | previous_orders
---------+------------+--------------------------------------------------------------------------
320 | 1992-07-10 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04, 1994-09-18, 1994-10-12]
320 | 1992-07-30 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04, 1994-09-18, 1994-10-12]
320 | 1994-07-08 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04, 1994-09-18, 1994-10-12]
320 | 1994-08-04 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04, 1994-09-18, 1994-10-12]
320 | 1994-09-18 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04, 1994-09-18, 1994-10-12]
320 | 1994-10-12 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04, 1994-09-18, 1994-10-12]
419 | 1992-03-16 | [1992-03-16, 1993-12-29, 1995-01-30]
419 | 1993-12-29 | [1992-03-16, 1993-12-29, 1995-01-30]
419 | 1995-01-30 | [1992-03-16, 1993-12-29, 1995-01-30]
ROWS
The three working examples for RANGE
above all work for ROWS
and produce identical output.
rows between unbounded preceding and current row
rows between current row and unbounded following
rows between unbounded preceding and unbounded following
output omitted - identical to above
However, ROWS
allows for far more control, since you can also do the syntax above that fails with range
:
rows between 1 preceding and 1 following
custkey | orderdate | previous_orders
---------+------------+--------------------------------------
320 | 1992-07-10 | [1992-07-10, 1992-07-30]
320 | 1992-07-30 | [1992-07-10, 1992-07-30, 1994-07-08]
320 | 1994-07-08 | [1992-07-30, 1994-07-08, 1994-08-04]
320 | 1994-08-04 | [1994-07-08, 1994-08-04, 1994-09-18]
320 | 1994-09-18 | [1994-08-04, 1994-09-18, 1994-10-12]
320 | 1994-10-12 | [1994-09-18, 1994-10-12]
419 | 1992-03-16 | [1992-03-16, 1993-12-29]
419 | 1993-12-29 | [1992-03-16, 1993-12-29, 1995-01-30]
419 | 1995-01-30 | [1993-12-29, 1995-01-30]
rows between current row and 1 following
custkey | orderdate | previous_orders
---------+------------+--------------------------
320 | 1992-07-10 | [1992-07-10, 1992-07-30]
320 | 1992-07-30 | [1992-07-30, 1994-07-08]
320 | 1994-07-08 | [1994-07-08, 1994-08-04]
320 | 1994-08-04 | [1994-08-04, 1994-09-18]
320 | 1994-09-18 | [1994-09-18, 1994-10-12]
320 | 1994-10-12 | [1994-10-12]
419 | 1992-03-16 | [1992-03-16, 1993-12-29]
419 | 1993-12-29 | [1993-12-29, 1995-01-30]
419 | 1995-01-30 | [1995-01-30]
rows between 5 preceding and 2 preceding
custkey | orderdate | previous_orders
---------+------------+--------------------------------------------------
320 | 1992-07-10 | NULL
320 | 1992-07-30 | NULL
320 | 1994-07-08 | [1992-07-10]
320 | 1994-08-04 | [1992-07-10, 1992-07-30]
320 | 1994-09-18 | [1992-07-10, 1992-07-30, 1994-07-08]
320 | 1994-10-12 | [1992-07-10, 1992-07-30, 1994-07-08, 1994-08-04]
419 | 1992-03-16 | NULL
419 | 1993-12-29 | NULL
419 | 1995-01-30 | [1992-03-16]