sql - SQL：在一段时间内计算的 SUM 之间的差异

Question

我有一个看起来像这样的表：

CREATE TABLE foobar (
                     id                     SERIAL PRIMARY KEY,
                     data_entry_date        DATE NOT NULL,
                     user_id                INTEGER NOT NULL,
                     wine_glasses_drunk     INTEGER NOT NULL,
                     whisky_shots_drunk     INTEGER NOT NULL,
                     beer_bottle_drunk      INTEGER NOT NULL
                 );

insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-01', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-02', 1, 4,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-03', 1, 0,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-04', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-05', 1, 2,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-07', 1, 1,2,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-08', 1, 4,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-11', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-12', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-13', 1, 2,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-14', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-15', 1, 9,3,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-16', 1, 0,4,2);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-17', 1, 0,5,3);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-18', 1, 2,2,5);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-20', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-23', 1, 1,3,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-24', 1, 0,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-01', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-02', 1, 2,3,4);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-05', 1, 1,2,2);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-09', 1, 0,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-10', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-11', 1, 3,6,3);

我想编写一个查询，显示给定时期内 TOTAL wine_glasses_drunk、TOTAL Whisky_shots_drunk 和 TOTAL beer_bottles_drunk 与上一时期的 TOTAL 的差异。

它可能听起来比它更复杂。如果我们使用的 period* 为 1周== 7 天，那么查询应该返回本周消耗的总数与上周消耗的总数相比的差异。

稍微复杂的是，表中的日期不是连续的 - 即有一些缺失的日期，因此查询需要在确定日期以进行期间计算时找到最相关的日期。

This is what I have so far:

-- using hard coded dates

SELECT (SUM(f1.wine_glasses_drunk) - SUM(f2.wine_glasses_drunk)) as wine_diff, 
(SUM(f1.whisky_shots_drunk) - SUM(f2.whisky_shots_drunk)) as whisky_diff, 
(SUM(f1.beer_bottle_drunk) - SUM(f2.beer_bottle_drunk)) as beer_diff 
FROM foobar f1 INNER JOIN foobar f2 ON f2.user_id=f1.user_id
WHERE f1.user_id=1 
AND f1.data_entry_date BETWEEN '2011-01-08' AND '2011-01-15'
AND f2.data_entry_date BETWEEN '2011-01-01' AND '2011-01-08'
AND f1.data_entry_date - f2.data_entry_date between 6 and 9;

上面的 SQL 显然是一个 hack（尤其是f1.data_entry_date - f2.data_entry_date between 6 and 9标准）。我检查了 excel 中的结果，上面查询的结果（不出所料）是错误的。

如何编写此查询 - 以及如何修改它以便它可以处理数据库中的非连续日期？

我正在使用 postgreSQl，但如果可能的话，我更喜欢与数据库无关的（即 ANSI）SQL。

score 2 · Accepted Answer

从您给出的描述中，我不能完全确定我是否以正确的方式进行此操作，但我会使用两种不同的功能来为您提供所需的结果。

首先，看一下 date_trunc 函数。这可以获得一周的第一天的日期，您可以对其进行分组以获得一周的总和。如果一周的第一天不是您想要的，您可以使用日期算术来解决这个问题。我想这周的第一天是星期一。

其次，您可以使用滞后窗口函数来查找前一行的总和。请注意，如果您错过了一周，此函数将查看前一行，而不仅仅是前一周。我检查了查询，以确保数据库正在查看正确的行。

select 
  user_id,
  week_start_date,
  this_week_wine_glasses_drunk -
    case when is_consecutive_weeks = 'TRUE' 
      then last_week_wine_glasses_drunk else 0 end as wine_glasses_drunk,
  this_week_whisky_shots_drunk -
    case when is_consecutive_weeks = 'TRUE' 
      then last_week_whisky_shots_drunk else 0 end as whisky_shots_drunk,
  this_week_beer_bottle_drunk -
    case when is_consecutive_weeks = 'TRUE' 
      then last_week_beer_bottle_drunk else 0 end as beer_bottle_drunk
from (
select
  user_id,
  week_start_date,
  this_week_wine_glasses_drunk,
  this_week_whisky_shots_drunk,
  this_week_beer_bottle_drunk,
  case when (lag(week_start_date)
    over (partition by user_id order by week_start_date)  + interval '7' day)
      = week_start_date then 'TRUE' end as is_consecutive_weeks,
  lag(this_week_wine_glasses_drunk) 
    over (partition by user_id order by week_start_date) as last_week_wine_glasses_drunk,
  lag(this_week_whisky_shots_drunk) 
    over (partition by user_id order by week_start_date) as last_week_whisky_shots_drunk,
  lag(this_week_beer_bottle_drunk) 
    over (partition by user_id order by week_start_date) as last_week_beer_bottle_drunk
from (
  select
    user_id,
    date_trunc('week', data_entry_date) as week_start_date,
    sum(wine_glasses_drunk) as this_week_wine_glasses_drunk,
    sum(whisky_shots_drunk) as this_week_whisky_shots_drunk,
    sum(beer_bottle_drunk) as this_week_beer_bottle_drunk
  from foobar
  group by user_id,
    date_trunc('week', data_entry_date)
  ) a
) b

有一个SQL fiddle供您查看。

顺便说一句，我来自 Oracle 背景，并使用 PostgreSQL 文档和 SQL Fiddle 破解了这个问题。希望这是你需要的。

score 1 · Accepted Answer

稍微不同的方法（我会让你填写日期参数。）：

Declare @StartDate1, @EndDate1, @StartDate2, @EndDate2 AS Date
Set @StartDate1='6/1/2012'
Set @EndDate1='6/15/2012'
Set @StartDate2='6/16/2012'
Set @EndDate2='6/30/2012'

SELECT SUM(U.WineP1)-SUM(U.WineP2) AS WineDiff, SUM(U.WhiskeyP1)-SUM(U.WhiskeyP2) AS WhiskeyDiff, SUM(U.BeerP1)-SUM(U.BeerP2) AS BeerDiff
FROM
(
SELECT SUM(wine_glasses_drunk) AS WineP1, SUM(whisky_shots_drunk) AS WhiskeyP1, SUM(beer_bottle_drunk) AS BeerP1, 0 AS WineP2, 0 AS WhiskeyP2, 0 AS BeerP2
FROM foobar
WHERE data_entry_date BETWEEN @StartDate1 AND @EndDate1

UNION ALL

SELECT 0 AS WineP1, 0 AS WhiskeyP1, 0 AS BeerP1, SUM(wine_glasses_drunk) AS WineP2, SUM(whisky_shots_drunk) AS WhiskeyP2, SUM(beer_bottle_drunk) AS BeerP2
FROM foobar
WHERE data_entry_date BETWEEN @StartDate2 AND @EndDate2
) AS U

score 0 · Accepted Answer

作为开发这些查询时的一般规则，将其构建起来，然后将它们组合起来。首先找到一个好的结构，然后分别构建你需要的所有部件，这样你就可以了解每个部件是如何独立工作的。

在这里，我认为您将需要使用更多的子查询来找到一种清晰的方法。我认为您可以尝试以下方法：

计算所需的日期范围，并将它们作为变量保存。（您可能希望在日期中添加天数以查找下一个期间，而不是您上面提供的代码。）

Declare @SQL1, @SQL2, @SQL3 as Date
Set @SQL1=(SQL1)
...

接下来，以使用日期作为参数的方式查找每周的总数。

Select 
  sum(wine_glasses_drunk) as wine_totals, 
  sum(whiskey_shots_drunk) as whiskey_totals, 
  sum(beer_bottle_drunk) as beer_totals,
  case 
    when data_entry_date between @SQL1 and @SQL2 then 1
    when data_entry_date between @SQL2 and @SQL3 then 2
  end as period_number
from foobar

然后，围绕此构建您需要的摘要查询，因为数据的格式很容易，并且您不需要多次使用相同值的这么多总和。

score 0 · Accepted Answer

我打算将此作为编辑添加到我的另一个答案中，但这确实是一种不同的方式，因此应该是一个单独的答案。

我想我更喜欢我给出的另一个答案，但即使数据存在差距，这个答案也应该有效。

要设置查询的参数，请更改with 子句部分的period_start_date和period_days的值。query_params

with query_params as (
  select 
    date '2011-01-01' as period_start_date,
    7 as period_days
),
summary_data as (
select
  user_id,
  (data_entry_date - period_start_date)/period_days as period_number,
  sum(wine_glasses_drunk) as wine_glasses_drunk,
  sum(whisky_shots_drunk) as whisky_shots_drunk,
  sum(beer_bottle_drunk) as beer_bottle_drunk
from foobar
  cross join query_params
group by user_id,
  (data_entry_date - period_start_date)/period_days
)
select
  user_id,
  period_number,
  period_start_date + period_number * period_days as period_start_date,
  sum(wine_glasses_drunk) as wine_glasses_drunk,
  sum(whisky_shots_drunk) as whisky_shots_drunk,
  sum(beer_bottle_drunk) as beer_bottle_drunk
from (
  -- this weeks data
  select 
    user_id,
    period_number,
    wine_glasses_drunk,
    whisky_shots_drunk,
    beer_bottle_drunk
  from summary_data
  union all
  -- last weeks data
  select 
    user_id,
    period_number + 1 as period_number,
    -wine_glasses_drunk as wine_glasses_drunk,
    -whisky_shots_drunk as whisky_shots_drunk,
    -beer_bottle_drunk as beer_bottle_drunk
  from summary_data
) a
cross join query_params
where period_number <= (select max(period_number) from summary_data)
group by 
  user_id,
  period_number,
  period_start_date + period_number * period_days
order by 1, 2

再一次，一个SQL Fiddle是可用的。

sql - SQL：在一段时间内计算的 SUM 之间的差异

4 回答 4

Related

Reference