sql - PostgreSQL：“按分钟”查询的运行行数

Question

我需要每分钟查询到那一分钟的总行数。

到目前为止，我能达到的最好成绩并没有奏效。它返回每分钟计数，而不是每分钟的总计数：

SELECT COUNT(id) AS count
     , EXTRACT(hour from "when") AS hour
     , EXTRACT(minute from "when") AS minute
  FROM mytable
 GROUP BY hour, minute

score 107 · Accepted Answer

仅返回几分钟的活动

最短的

SELECT DISTINCT
       date_trunc('minute', "when") AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY 1;

使用date_trunc()，它会返回您需要的内容。

不要包含id在查询中，因为您想要GROUP BY分钟切片。

count()通常用作普通聚合函数。附加一个OVER子句使其成为一个窗口函数。PARTITION BY在窗口定义中省略- 您希望对所有行进行运行计数。默认情况下，从定义的当前行的第一行到最后一个对等方计数ORDER BY。手册：

默认框架选项是RANGE UNBOUNDED PRECEDING，与相同RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW。使用ORDER BY，这会将框架设置为从分区开始到当前行的最后一个ORDER BY对等方的所有行。

而这恰好正是您所需要的。

Use count(*) rather than count(id). It better fits your question ("count of rows"). It is generally slightly faster than count(id). And, while we might assume that id is NOT NULL, it has not been specified in the question, so count(id) is wrong, strictly speaking, because NULL values are not counted with count(id).

You can't GROUP BY minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*) would only see 1 row per minute this way.
You can, however, SELECT DISTINCT, because DISTINCT is applied after window functions.

ORDER BY 1 is just shorthand for ORDER BY date_trunc('minute', "when") here.
1 is a positional reference reference to the 1st expression in the SELECT list.

Use to_char() if you need to format the result. Like:

SELECT DISTINCT
       to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY date_trunc('minute', "when");

Fastest

SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) sub
ORDER  BY 1;

Much like the above, but:

I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCT in the outer SELECT.

Use sum() as window aggregate function now to add up the counts from the subquery.

I found this to be substantially faster with many rows per minute.

Include minutes without activity

Shortest

@GabiMe asked in a comment how to get eone row for every minute in the time frame, including those where no event occured (no row in base table):

SELECT DISTINCT
       minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER  BY 1;

Generate a row for every minute in the time frame between the first and the last event with generate_series() - here directly based on aggregated values from the subquery.

LEFT JOIN to all timestamps truncated to the minute and count. NULL values (where no row exists) do not add to the running count.

Fastest

With CTE:

WITH cte AS (
   SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) 
SELECT m.minute
     , COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(min(minute), max(minute), interval '1 min')
   FROM   cte
   ) m(minute)
LEFT   JOIN cte USING (minute)
ORDER  BY 1;

Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT.

Different from count(), sum() can return NULL. Default to 0 with COALESCE.

With many rows and an index on "when" this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:

SELECT m.minute
     , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) c USING (minute)
ORDER  BY 1;

sql - PostgreSQL：“按分钟”查询的运行行数

1 回答 1

仅返回几分钟的活动

最短的

Fastest

Include minutes without activity

Shortest

Fastest

Related

Reference