16

我有一组电子邮件地址和将这些电子邮件地址添加到表中的日期的数据集。对于不同的日期,一个电子邮件地址可以有多个条目。例如,如果我有下面的数据集。我希望获得在所述日期和 3 天前之间我们拥有的不同电子邮件的日期和数量。

Date   | email  
-------+----------------
1/1/12 | test@test.com
1/1/12 | test1@test.com
1/1/12 | test2@test.com
1/2/12 | test1@test.com
1/2/12 | test2@test.com
1/3/12 | test@test.com
1/4/12 | test@test.com
1/5/12 | test@test.com
1/5/12 | test@test.com
1/6/12 | test@test.com
1/6/12 | test@test.com
1/6/12 | test1@test.com

如果我们使用 3 的日期周期,结果集将如下所示

date   | count(distinct email)
-------+------
1/1/12 | 3
1/2/12 | 3
1/3/12 | 3
1/4/12 | 3
1/5/12 | 1
1/6/12 | 2

我可以使用下面的查询获得日期范围的不同计数,但希望按天获得一个范围的计数,因此我不必手动更新数百个日期的范围。

select test.date, count(distinct test.email)  
from test_table as test  
where test.date between '2012-01-01' and '2012-05-08'  
group by test.date;
4

5 回答 5

16

测试用例:

CREATE TABLE tbl (date date, email text);
INSERT INTO tbl VALUES
  ('2012-01-01', 'test@test.com')
, ('2012-01-01', 'test1@test.com')
, ('2012-01-01', 'test2@test.com')
, ('2012-01-02', 'test1@test.com')
, ('2012-01-02', 'test2@test.com')
, ('2012-01-03', 'test@test.com')
, ('2012-01-04', 'test@test.com')
, ('2012-01-05', 'test@test.com')
, ('2012-01-05', 'test@test.com')
, ('2012-01-06', 'test@test.com')
, ('2012-01-06', 'test@test.com')
, ('2012-01-06', 'test1@test.com`')
;

查询 - 仅返回条目存在于的天数tbl

SELECT date
     ,(SELECT count(DISTINCT email)
       FROM   tbl
       WHERE  date BETWEEN t.date - 2 AND t.date -- period of 3 days
      ) AS dist_emails
FROM   tbl t
WHERE  date BETWEEN '2012-01-01' AND '2012-01-06'  
GROUP  BY 1
ORDER  BY 1;

或者 - 返回指定范围内的所有日期,即使当天没有行:

SELECT date
     ,(SELECT count(DISTINCT email)
       FROM   tbl
       WHERE  date BETWEEN g.date - 2 AND g.date
      ) AS dist_emails
FROM  (SELECT generate_series(timestamp '2012-01-01'
                            , timestamp '2012-01-06'
                            , interval  '1 day')::date) AS g(date);

db<>在这里摆弄

结果:

day        | dist_emails
-----------+------------
2012-01-01 | 3
2012-01-02 | 3
2012-01-03 | 3
2012-01-04 | 3
2012-01-05 | 1
2012-01-06 | 2

起初这听起来像是窗口函数的工作,但我没有找到定义合适窗口框架的方法。此外,根据文档

聚合窗口函数与普通聚合函数不同,不允许DISTINCTORDER BY在函数参数列表中使用。

所以我用相关的子查询来解决它。我想这是最聪明的方法。

顺便说一句,“在所述日期和 3 天前之间”将是4天。你的定义在那里是矛盾的。

稍微短一些,但几天慢一些:

SELECT g.date, count(DISTINCT email) AS dist_emails
FROM  (SELECT generate_series(timestamp '2012-01-01'
                            , timestamp '2012-01-06'
                            , interval  '1 day')::date) AS g(date)
LEFT   JOIN tbl t ON t.date BETWEEN g.date - 2 AND g.date
GROUP  BY 1
ORDER  BY 1;

有关的:

于 2012-05-11T03:33:54.310 回答
1

Alateral join对于此类“滑动窗口”需求很有用,如下所示:

SELECT
       t.day
     , ljl.dist_emails
FROM   tbl t
LEFT JOIN LATERAL (
        SELECT
               count(DISTINCT email) as dist_emails
        FROM   tbl
        WHERE  day BETWEEN t.day - 2 AND t.day -- period of 3 days
       ) AS ljl ON TRUE
WHERE t.day BETWEEN '2012-01-01' AND '2012-01-06' 

请注意,这是 Erwin Brandstetter 先前查询的变体,令我惊讶的是他没有建议,但这些横向连接非常适合这种需求。

于 2021-04-02T03:01:39.443 回答
0

在 sql 服务器中:

`select test.date, count(distinct test.email) from test_table as test  where convert(date,test.date) between '2012-01-01' and '2012-05-08' group by test.date`

希望这可以帮助。

于 2012-05-11T01:40:26.887 回答
0

您可以始终使用 dateadd 函数,而不是指定日期:

test.date > dateadd(dd,-7,getdate())
于 2012-10-17T11:52:54.510 回答
0

滑动窗口不同计数的示例:

SELECT b.day, count(DISTINCT a.user_id)
from glip_production.presences_1d a,
 (SELECT distinct(day), TIMESTAMPADD(day,-6, day) dt_start
  from glip_production.presences_1d t1) b
where a.day >= b.dt_start and a.day <= b.day and b.day > '2017-11-01'
group by b.day
于 2017-11-10T19:06:20.417 回答