1

我有一个 Postgres 9.1 数据库。我正在尝试生成每周的记录数(对于给定的日期范围)并将其与上一年进行比较。

我有以下代码用于生成系列:

select generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series

但是,我不确定如何将计数的记录加入到生成的日期中。

因此,以以下记录为例:

Pt_ID      exam_date
======     =========
1          2012-01-02
2          2012-01-02
3          2012-01-08
4          2012-01-08
1          2013-01-02
2          2013-01-02
3          2013-01-03
4          2013-01-04
1          2013-01-08
2          2013-01-10
3          2013-01-15
4          2013-01-24

我想让记录返回为:

  series        thisyr      lastyr
===========     =====       =====
2013-01-01        4           2
2013-01-08        3           2
2013-01-15        1           0
2013-01-22        1           0
2013-01-29        0           0

不确定如何在子搜索中引用日期范围。感谢您的任何帮助。

4

2 回答 2

3

简单的方法是通过@jpw 演示的 CROSS JOIN 来解决这个问题。但是,也有一些隐藏的问题

  1. 无条件的性能CROSS JOIN随着行数的增加而迅速恶化。在可以在聚合中处理这个巨大的派生表之前,总行数乘以您正在测试的周数。索引无济于事。

  2. 从 1 月 1 日开始的几周会导致不一致。ISO 周可能是另一种选择。见下文。

以下所有查询都大量exam_date使用. 一定要有一个。

仅加入相关行

应该更快

SELECT d.day, d.thisyr
     , count(t.exam_date) AS lastyr
FROM  (
   SELECT d.day::date, (d.day - '1 year'::interval)::date AS day0  -- for 2nd join
        , count(t.exam_date) AS thisyr
   FROM   generate_series('2013-01-01'::date
                        , '2013-01-31'::date  -- last week overlaps with Feb.
                        , '7 days'::interval) d(day)  -- returns timestamp
   LEFT   JOIN tbl t ON t.exam_date >= d.day::date
                    AND t.exam_date <  d.day::date + 7
   GROUP  BY d.day
   ) d
LEFT   JOIN tbl t ON t.exam_date >= d.day0      -- repeat with last year
                 AND t.exam_date <  d.day0 + 7
GROUP  BY d.day, d.thisyr
ORDER  BY d.day;

这是从 1 月 1 日开始的几周,就像你原来的那样。正如所评论的那样,这会产生一些不一致的情况:每周从不同的一天开始,并且由于我们在年底切断,一年中的最后一周只有 1 或 2 天(闰年)。

ISO周也一样

根据要求,请考虑ISO 周,它从星期一开始,始终跨越 7 天。但他们跨越了岁月的边界。每个文档EXTRACT()

星期

一天中的星期数。根据定义 (ISO 8601),星期从星期一开始,一年的第一周包含该年的 1 月 4 日。换句话说,一年中的第一个星期四是在该年的第一周。

在 ISO 定义中,1 月初的日期可能是上一年第 52 或 53 周的一部分,而 12 月下旬的日期可能是明年第一周的一部分。例如,2005-01-01是 2004 年第 53 周的 2006-01-01一部分,并且是 2005 年第 52 周的2012-12-31一部分,而是 2013 年第一周的一部分。建议与该isoyear 字段一起使用week以获得一致的结果。

上面的查询用 ISO 周重写:

SELECT w AS isoweek
     , day::text  AS thisyr_monday, thisyr_ct
     , day0::text AS lastyr_monday, count(t.exam_date) AS lastyr_ct
FROM  (
   SELECT w, day
        , date_trunc('week', '2012-01-04'::date)::date + 7 * w AS day0
        , count(t.exam_date) AS thisyr_ct
   FROM  (
      SELECT w
           , date_trunc('week', '2013-01-04'::date)::date + 7 * w AS day
      FROM   generate_series(0, 4) w
      ) d
   LEFT   JOIN tbl t ON t.exam_date >= d.day
                    AND t.exam_date <  d.day + 7
   GROUP  BY d.w, d.day
   ) d
LEFT   JOIN tbl t ON t.exam_date >= d.day0     -- repeat with last year
                 AND t.exam_date <  d.day0 + 7
GROUP  BY d.w, d.day, d.day0, d.thisyr_ct
ORDER  BY d.w, d.day;

1 月 4 日始终是一年中的第一个 ISO 周。所以这个表达式获取给定年份的第一个 ISO 周的星期一的日期:

date_trunc('week', '2012-01-04'::date)::date

简化为EXTRACT()

由于 ISO 周数与 返回的周数一致EXTRACT(),我们可以简化查询。首先,一个简短的形式:

SELECT w AS isoweek
     , COALESCE(thisyr_ct, 0) AS thisyr_ct
     , COALESCE(lastyr_ct, 0) AS lastyr_ct
FROM   generate_series(1, 5) w
LEFT   JOIN (
   SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS thisyr_ct
   FROM   tbl
   WHERE  EXTRACT(isoyear FROM exam_date)::int = 2013
   GROUP  BY 1
   ) t13  USING (w)
LEFT   JOIN (
   SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS lastyr_ct
   FROM   tbl
   WHERE  EXTRACT(isoyear FROM exam_date)::int = 2012
   GROUP  BY 1
   ) t12  USING (w);

优化查询

相同的更多细节并针对性能进行了优化

WITH params AS (          -- enter parameters here, once 
   SELECT date_trunc('week', '2012-01-04'::date)::date AS last_start
        , date_trunc('week', '2013-01-04'::date)::date AS this_start
        , date_trunc('week', '2014-01-04'::date)::date AS next_start
        , 1 AS week_1
        , 5 AS week_n     -- show weeks 1 - 5
   )
SELECT w.w AS isoweek
     , p.this_start + 7 * (w - 1) AS thisyr_monday
     , COALESCE(t13.ct, 0) AS thisyr_ct
     , p.last_start + 7 * (w - 1) AS lastyr_monday
     , COALESCE(t12.ct, 0) AS lastyr_ct
FROM params p
   , generate_series(p.week_1, p.week_n) w(w)
LEFT   JOIN (
   SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
   FROM   tbl t, params p
   WHERE  t.exam_date >= p.this_start      -- only relevant dates
   AND    t.exam_date <  p.this_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND    t.exam_date <  p.next_start      -- don't cross over into next year
   GROUP  BY 1
   ) t13  USING (w)
LEFT   JOIN (                              -- same for last year
   SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
   FROM   tbl t, params p
   WHERE  t.exam_date >= p.last_start
   AND    t.exam_date <  p.last_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND    t.exam_date <  p.this_start
   GROUP  BY 1
   ) t12  USING (w);

借助索引支持,这应该非常快,并且可以轻松适应选择的间隔。最后一个查询中的隐式JOIN LATERALfor需要Postgres 9.3generate_series()

SQL小提琴。

于 2014-11-10T02:47:40.060 回答
1

使用cross join应该可以,我将粘贴下面 SQL Fiddle 的降价输出。对于 2013-01-08 系列,您的示例输出似乎不正确:thisyr 应该是 2,而不是 3。但这可能不是最好的方法,但我的 Postgresql 知识还有很多不足之处。

SQL小提琴

PostgreSQL 9.2.4 架构设置

CREATE TABLE Table1
    ("Pt_ID" varchar(6), "exam_date" date);

INSERT INTO Table1
    ("Pt_ID", "exam_date")
VALUES
    ('1', '2012-01-02'),('2', '2012-01-02'),
    ('3', '2012-01-08'),('4', '2012-01-08'),
    ('1', '2013-01-02'),('2', '2013-01-02'),
    ('3', '2013-01-03'),('4', '2013-01-04'),
    ('1', '2013-01-08'),('2', '2013-01-10'),
    ('3', '2013-01-15'),('4', '2013-01-24');

查询 1

select 
  series, 
  sum (
    case 
      when exam_date 
        between series and series + '6 day'::interval
      then 1 
      else 0 
    end
  ) as thisyr,
  sum (
    case 
      when exam_date + '1 year'::interval 
        between series and series + '6 day'::interval
      then 1 else 0 
    end
  ) as lastyr

from table1
cross join generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
group by series
order by series

结果

|                         SERIES | THISYR | LASTYR |
|--------------------------------|--------|--------|
| January, 01 2013 00:00:00+0000 |      4 |      2 |
| January, 08 2013 00:00:00+0000 |      2 |      2 |
| January, 15 2013 00:00:00+0000 |      1 |      0 |
| January, 22 2013 00:00:00+0000 |      1 |      0 |
| January, 29 2013 00:00:00+0000 |      0 |      0 |
于 2014-11-09T23:14:10.047 回答