给定一个包含每天 1 行的表格和当天生成的表格dt
,如何有效地计算包含该产品在过去 7 天产生的营业额的列?product_id
turnover
turnover_7day
我发现一个简单的查询按预期工作,但速度很慢,我试图在几年的时间里运行数百万种产品的查询。
SQL Fiddle
(即使 Fiddle 是 Postgresql,IRL 我正在尝试在 Snowflake 上执行此操作;我怀疑 Snowflake 中是否有可以完全改变对这篇文章的潜在答案的功能)
数据集:
TABLE turnover_per_day:
| product_id | product_name | dt | turnover |
|------------|--------------|------------|----------|
| 1 | PS5 | 2021-10-22 | 85 |
| 1 | PS5 | 2021-10-27 | 100 |
| 1 | PS5 | 2021-11-01 | 110 |
| 1 | PS5 | 2021-11-05 | 150 |
| 2 | XBOX | 2021-11-02 | 10 |
| 2 | XBOX | 2021-11-03 | 15 |
| 2 | XBOX | 2021-11-04 | 13 |
| 2 | XBOX | 2021-11-05 | 11 |
方法一:SELECT语句中的子查询(产生预期的结果,非常低效):
我在这里使用源表上的子查询来重新计算营业额。它看起来效率很低,但至少它很容易理解。
询问:
SELECT
t1.product_id
,t1.product_name
,t1.turnover
,t1.dt
,(
SELECT SUM(turnover) FROM turnover_per_day t2
WHERE (t2.dt BETWEEN t1.dt - interval '6 day' AND t1.dt) and t1.product_id=t2.product_id
) as turnover_7day
FROM turnover_per_day as t1
order by product_id, t1.dt
结果(如预期):
| product_id | product_name | turnover | dt | turnover_7day |
|------------|--------------|----------|------------|---------------|
| 1 | PS5 | 85 | 2021-10-22 | 85 |
| 1 | PS5 | 100 | 2021-10-27 | 185 |
| 1 | PS5 | 110 | 2021-11-01 | 210 |
| 1 | PS5 | 150 | 2021-11-05 | 260 |
| 2 | XBOX | 10 | 2021-11-02 | 10 |
| 2 | XBOX | 15 | 2021-11-03 | 25 |
| 2 | XBOX | 13 | 2021-11-04 | 38 |
| 2 | XBOX | 11 | 2021-11-05 | 49 |
方法2:尝试重现此答案(但失败)
在这里,我尝试使用窗口函数来加快计算速度。我试图添加PARTITION BY product_id
上面链接的答案,但它没有按预期工作。我的想法是因为LEFT JOIN
列的值product_id
是NULL
,并且在取前 6 行时,它“删除”了 NULL 行,因此总和超过 6 天。
询问:
with days as ( -- generate a calendar without gap
SELECT date_trunc('day', d)::date as day
FROM generate_series(CURRENT_DATE-15, CURRENT_DATE, '1 day'::interval) d
)
select
days.day
,t1.product_id
,t1.product_name
,t1.turnover
,t1.dt
,SUM(t1.turnover) OVER (PARTITION BY t1.product_id ORDER BY dt ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS turnover_7day
FROM days
LEFT JOIN turnover_per_day as t1
ON days.day = t1.dt
--where t1.product_id is not null
order by product_id, dt
结果(错误):
| day | product_id | product_name | turnover | dt | turnover_7day |
|------------|------------|--------------|----------|------------|---------------|
| 2021-10-22 | 1 | PS5 | 85 | 2021-10-22 | 85 |
| 2021-10-27 | 1 | PS5 | 100 | 2021-10-27 | 185 |
| 2021-11-01 | 1 | PS5 | 110 | 2021-11-01 | 295 |
| 2021-11-05 | 1 | PS5 | 150 | 2021-11-05 | 445 |
| 2021-11-02 | 2 | XBOX | 10 | 2021-11-02 | 10 |
| 2021-11-03 | 2 | XBOX | 15 | 2021-11-03 | 25 |
| 2021-11-04 | 2 | XBOX | 13 | 2021-11-04 | 38 |
| 2021-11-05 | 2 | XBOX | 11 | 2021-11-05 | 49 |
| 2021-10-31 | (null) | (null) | (null) | (null) | (null) |
| 2021-10-29 | (null) | (null) | (null) | (null) | (null) |
| 2021-10-23 | (null) | (null) | (null) | (null) | (null) |
| 2021-10-24 | (null) | (null) | (null) | (null) | (null) |
| 2021-10-25 | (null) | (null) | (null) | (null) | (null) |
| 2021-10-26 | (null) | (null) | (null) | (null) | (null) |
| 2021-10-28 | (null) | (null) | (null) | (null) | (null) |
| 2021-10-21 | (null) | (null) | (null) | (null) | (null) |
| 2021-10-30 | (null) | (null) | (null) | (null) | (null) |
我的问题是:
- 如何修改方法 2 以使其工作?
- 是否有任何其他较少的数据/计算密集型方法来计算此滚动总和(按 product_id 分组)?