sql - 需要在 SQL 中查找按 ID 分组的前 3 条记录的平均值

Question

我有一个包含客户 ID、日期和整数的 postgres 表。我需要找到日期在去年的每个客户 ID 的前 3 条记录的平均值。我可以使用下面的 SQL 使用单个 ID 来执行此操作（id 是客户 ID，周末是日期，maxattached 是整数）。

一个警告：最大值是每月，这意味着我们只查看给定月份中的最大值来创建我们的数据集，因此我们从日期中提取月份。

SELECT 
  id,
  round(avg(max),0) 
FROM 
  (
   select 
     id,
     extract(month from weekending) as month,
     extract(year from weekending) as year,
     max(maxattached) as max 
   FROM 
     myTable 
   WHERE
     weekending >= now() - interval '1 year' AND 
     id=110070 group by id,month,year 
   ORDER BY
     max desc limit 3
   ) AS t 
GROUP BY id;

如何扩展此查询以包含所有 ID 和每个 ID 的单个平均数？

以下是一些示例数据：

ID     | MaxAttached | Weekending
110070 | 5           | 2011-11-10
110070 | 6           | 2011-11-17
110071 | 4           | 2011-11-10
110071 | 7           | 2011-11-17
110070 | 3           | 2011-12-01
110071 | 8           | 2011-12-01
110070 | 5           | 2012-01-01
110071 | 9           | 2012-01-01

因此，对于此示例表，我希望收到以下结果：

ID     | MaxAttached

110070 | 5           
110071 | 8

这会平均每个 ID 在给定月份中的最高值（110070 为 6、3、5，110071 为 7、8、9）

注意：postgres 版本 8.1.15

score 4 · Accepted Answer

首先 -max(maxattached)为每个客户和每个月获取：

SELECT id,
       max(maxattached) as max_att         
FROM myTable 
WHERE weekending >= now() - interval '1 year' 
GROUP BY id, date_trunc('month',weekending);

接下来 - 为每个客户排名他的所有价值观：

SELECT id,
       max_att,
       row_number() OVER (PARTITION BY id ORDER BY max_att DESC) as max_att_rank
FROM <previous select here>;

接下来 - 为每位客户获得前 3 名：

SELECT id,
       max_att
FROM <previous select here>
WHERE max_att_rank <= 3;

接下来 - 获取avg每个客户的值：

SELECT id,
       avg(max_att) as avg_att
FROM <previous select here>
GROUP BY id;

接下来 - 只需将所有查询放在一起并根据您的情况重写/简化它们。

更新：这是一个带有您的测试数据和查询的 SQLFiddle：SQLFiddle。

UPDATE2：这是适用于 8.1 的查询：

SELECT customer_id,
       (SELECT round(avg(max_att),0)
        FROM (SELECT max(maxattached) as max_att         
              FROM table1
              WHERE weekending >= now() - interval '2 year' 
                AND id = ct.customer_id
              GROUP BY date_trunc('month',weekending)
              ORDER BY max_att DESC
              LIMIT 3) sub 
        ) as avg_att
FROM customer_table ct;

这个想法 - 获取您的初始查询并为每个客户运行它（customer_table- 对客户来说都是唯一的表id）。

这是带有此查询的 SQLFiddle：SQLFiddle。

仅在 8.3 版上测试（8.1 太旧，无法在 SQLFiddle 上使用）。

score 0 · Accepted Answer

8.3版本

8.3 是我可以访问的最旧版本，所以我不能保证它会在 8.1 中工作

我正在使用一个临时表来计算最好的三个记录。

CREATE TABLE temp_highest_per_month as
   select 
     id,
     extract(month from weekending) as month,
     extract(year from weekending) as year,
     max(maxattached) as max_in_month,
     0 as priority
   FROM 
     myTable 
   WHERE
     weekending >= now() - interval '1 year' 
   group by id,month,year;

UPDATE temp_highest_per_month t
SET priority = 
 (select count(*) from temp_highest_per_month t2
  where t2.id = t.id and 
   (t.max_in_month < t2.max_in_month or
     (t.max_in_month= t2.max_in_month and
      t.year * 12 + t.month > t2.year * 12 + t.month)));

select id,round(avg(max_in_month),0)
from temp_highest_per_month
where priority <= 3
group by id;

年份和月份包含在计算优先级中，因此如果两个月的最大值相同，它们仍将正确包含在编号中。

9.1版本

与 Igor 的回答类似，但我使用 With 子句来拆分步骤。

with highest_per_month as
  ( select 
     id,
     extract(month from weekending) as month,
     extract(year from weekending) as year,
     max(maxattached) as max_in_month
   FROM 
     myTable 
   WHERE
     weekending >= now() - interval '1 year' 
   group by id,month,year),
  prioritised as
  ( select id, month, year, max_in_month,
    row_number() over (partition by id, month, year
                       order by max_in_month desc)
    as priority
    from highest_per_month
   )
select id, round(avg(max_in_month),0)
from prioritised
where priority <= 3
group by id;

sql - 需要在 SQL 中查找按 ID 分组的前 3 条记录的平均值

2 回答 2

Related

Reference