sql - 限制平均使用的行数

Question

我有一个查询（postgresql），我想限制用于计算平均值的行

SELECT username,avg(income),count(*) FROM
       Events 
WHERE to_timestamp(eventtimestamp)  >=  '2008-02-23' AND 
      to_timestamp(eventtimestamp) <=   '2009-01-03' and username='Joe'
GROUP BY userid

Joe 有 40 个条目，但我想限制用于计算平均收入的行数。我知道我可以在查询末尾添加限制函数，但这限制了整个查询的输出，而不是查询头部的平均命令所考虑的行。有什么提示我可以告诉 avg 只使用前 n 行吗？

例如不工作

SELECT username,avg(income) limit 5,count(*) FROM
       Events 
WHERE to_timestamp(eventtimestamp)  >=  '2008-02-23' AND 
      to_timestamp(eventtimestamp) <=   '2009-01-03' and username='Joe'
GROUP BY userid

仅对前 5 行进行平均。

谢谢！

score 3 · Accepted Answer

我添加一个答案有两个原因。首先，大多数其他答案都会影响count(*)以及avg()，这不是问题的一部分。其次，您可能希望为多个用户执行此操作。

因此，您可以尝试以下方法：

SELECT username, avg(case when seqnum <= 40 then income end), count(*)
FROM (select e.*, ROW_NUMBER() over (partition by username order by eventtimestamp desc) as seqnum
      from Events e
      WHERE to_timestamp(eventtimestamp)  >=  '2008-02-23' AND 
            to_timestamp(eventtimestamp) <=   '2009-01-03'
     ) e
GROUP BY username

score 2 · Accepted Answer

您可以取内部查询的平均值：

SELECT username,avg(income),count(*)
FROM (
  SELECT username, income
  FROM Events 
  WHERE to_timestamp(eventtimestamp) BETWEEN '2008-02-23' AND '2009-01-03'
  and username='Joe'
  LIMIT 5) x
GROUP BY userid;

还要注意使用的简化BETWEEN

score 1 · Accepted Answer

如果碰巧您更喜欢（或不在乎）以当前行结尾的 5 行的平均值，则可以使用窗口函数避免子选择：

select
    username,
    avg(income) over(rows 4 preceding),
    count(*)
from events 
where to_timestamp(eventtimestamp)  >=  '2008-02-23' and 
      to_timestamp(eventtimestamp) <=   '2009-01-03' and username='joe'
group by userid

如果我理解您的评论，您确实可以count用作窗口函数：

    count(*) over(rows 4 preceding)

或者如果不想计算空值：

    count(income) over(rows 4 preceding)

score 1 · Accepted Answer

您可以在子选择中使用限制；

SELECT username,avg(income),count(*) FROM
  (SELECT * FROM Events 
   WHERE to_timestamp(eventtimestamp)  >=  '2008-02-23' AND 
      to_timestamp(eventtimestamp) <=   '2009-01-03' and username='Joe'
   order by to_timestamp(eventtimestamp) desc
   LIMIT 10) sub
GROUP BY userid;

score 0 · Accepted Answer

好的答案已经发布。我建议使用 Unix 纪元进行比较和排序，如下所示：

SELECT userid, username, avg(income), count(*)
FROM (
  SELECT userid, username, income
  FROM Events 
  WHERE eventtimestamp BETWEEN date_part('epoch', '2008-02-23'::date) 
      AND date_part('epoch', '2009-01-03'::date)
    AND username='Joe'
  ORDER BY eventtimestamp DESC LIMIT 10) AS q
GROUP BY userid, username;

通过这样做，我不会为每一行调用转换函数。另一种方法可能是创建一个功能索引，to_timestamp(eventtimestamp)但我认为我这样做的方式更有效。

请注意，我已经包含了用户 ID 和用户名 - 原始示例会引发错误，因为用户名不在“GROUP BY”子句中。

另外，如果您想从随机样本而不是最后n个条目进行计算，您可以将排序更改为ORDER BY random()

sql - 限制平均使用的行数

5 回答 5

Related

Reference