2

我有一个包含客户 ID、日期和整数的 postgres 表。我需要找到日期在去年的每个客户 ID 的前 3 条记录的平均值。我可以使用下面的 SQL 使用单个 ID 来执行此操作(id 是客户 ID,周末是日期,maxattached 是整数)。

一个警告:最大值是每月,这意味着我们只查看给定月份中的最大值来创建我们的数据集,因此我们从日期中提取月份。

SELECT 
  id,
  round(avg(max),0) 
FROM 
  (
   select 
     id,
     extract(month from weekending) as month,
     extract(year from weekending) as year,
     max(maxattached) as max 
   FROM 
     myTable 
   WHERE
     weekending >= now() - interval '1 year' AND 
     id=110070 group by id,month,year 
   ORDER BY
     max desc limit 3
   ) AS t 
GROUP BY id;

如何扩展此查询以包含所有 ID 和每个 ID 的单个平均数?

以下是一些示例数据:

ID     | MaxAttached | Weekending
110070 | 5           | 2011-11-10
110070 | 6           | 2011-11-17
110071 | 4           | 2011-11-10
110071 | 7           | 2011-11-17
110070 | 3           | 2011-12-01
110071 | 8           | 2011-12-01
110070 | 5           | 2012-01-01
110071 | 9           | 2012-01-01

因此,对于此示例表,我希望收到以下结果:

ID     | MaxAttached

110070 | 5           
110071 | 8

这会平均每个 ID 在给定月份中的最高值(110070 为 6、3、5,110071 为 7、8、9)

注意:postgres 版本 8.1.15

4

2 回答 2

4

首先 -max(maxattached)为每个客户和每个月获取:

SELECT id,
       max(maxattached) as max_att         
FROM myTable 
WHERE weekending >= now() - interval '1 year' 
GROUP BY id, date_trunc('month',weekending);

接下来 - 为每个客户排名他的所有价值观:

SELECT id,
       max_att,
       row_number() OVER (PARTITION BY id ORDER BY max_att DESC) as max_att_rank
FROM <previous select here>;

接下来 - 为每位客户获得前 3 名:

SELECT id,
       max_att
FROM <previous select here>
WHERE max_att_rank <= 3;

接下来 - 获取avg每个客户的值:

SELECT id,
       avg(max_att) as avg_att
FROM <previous select here>
GROUP BY id;

接下来 - 只需将所有查询放在一起并根据您的情况重写/简化它们。

更新:这是一个带有您的测试数据和查询的 SQLFiddle:SQLFiddle

UPDATE2:这是适用于 8.1 的查询:

SELECT customer_id,
       (SELECT round(avg(max_att),0)
        FROM (SELECT max(maxattached) as max_att         
              FROM table1
              WHERE weekending >= now() - interval '2 year' 
                AND id = ct.customer_id
              GROUP BY date_trunc('month',weekending)
              ORDER BY max_att DESC
              LIMIT 3) sub 
        ) as avg_att
FROM customer_table ct;

这个想法 - 获取您的初始查询并为每个客户运行它(customer_table- 对客户来说都是唯一的表id)。

这是带有此查询的 SQLFiddle:SQLFiddle

仅在 8.3 版上测试(8.1 太旧,无法在 SQLFiddle 上使用)。

于 2013-01-12T18:09:57.620 回答
0

8.3版本

8.3 是我可以访问的最旧版本,所以我不能保证它会在 8.1 中工作

我正在使用一个临时表来计算最好的三个记录。

CREATE TABLE temp_highest_per_month as
   select 
     id,
     extract(month from weekending) as month,
     extract(year from weekending) as year,
     max(maxattached) as max_in_month,
     0 as priority
   FROM 
     myTable 
   WHERE
     weekending >= now() - interval '1 year' 
   group by id,month,year;

UPDATE temp_highest_per_month t
SET priority = 
 (select count(*) from temp_highest_per_month t2
  where t2.id = t.id and 
   (t.max_in_month < t2.max_in_month or
     (t.max_in_month= t2.max_in_month and
      t.year * 12 + t.month > t2.year * 12 + t.month)));

select id,round(avg(max_in_month),0)
from temp_highest_per_month
where priority <= 3
group by id;

年份和月份包含在计算优先级中,因此如果两个月的最大值相同,它们仍将正确包含在编号中。

9.1版本

与 Igor 的回答类似,但我使用 With 子句来拆分步骤。

with highest_per_month as
  ( select 
     id,
     extract(month from weekending) as month,
     extract(year from weekending) as year,
     max(maxattached) as max_in_month
   FROM 
     myTable 
   WHERE
     weekending >= now() - interval '1 year' 
   group by id,month,year),
  prioritised as
  ( select id, month, year, max_in_month,
    row_number() over (partition by id, month, year
                       order by max_in_month desc)
    as priority
    from highest_per_month
   )
select id, round(avg(max_in_month),0)
from prioritised
where priority <= 3
group by id;
于 2013-01-12T18:34:15.353 回答