考虑 PostgreSQL 9.4 中的以下事件数据:
eventTime | eventName
2015-09-25 18:00:00 | 'AAA'
2015-09-25 17:00:00 | 'BBB'
2015-09-25 16:00:00 | 'BBB'
2015-09-25 15:00:00 | 'BBB'
2015-09-25 14:00:00 | 'AAA'
2015-09-26 13:00:00 | 'CCC'
2015-09-26 12:00:00 | 'AAA'
2015-09-26 11:00:00 | 'BBB'
2015-09-26 10:00:00 | 'CCC'
2015-09-26 09:00:00 | 'BBB'
2015-09-27 08:00:00 | 'AAA'
2015-09-27 07:00:00 | 'CCC'
2015-09-27 05:00:00 | 'CCC'
2015-09-27 04:00:00 | 'CCC'
2015-09-27 03:00:00 | 'CCC'
2015-09-27 02:00:00 | 'AAA'
虽然基于单一count()
的表很简单,例如:
SELECT eventTime, count(1)
from (SELECT data->>'eventName' as eventName,
date_trunc('day', to_timestamp(data->>'timestamp','YYYY-MM-DDZHH24:MI:SS.MS')::timestamp without time zone) AS eventTime
FROM sidetrack where (data->>'eventName' = 'AAA') IS TRUE) AS tmptab
GROUP BY eventTime
ORDER BY eventTime ASC
只能计算 的单个值的出现次数eventName
。我对 SQL 不是很有经验,并且正在努力寻找一种方法来创建双向频率表。在此示例中,结果将是:
day | 'AAA' | 'BBB' | 'CCC'
------------+-------+-------+-------
2015-09-25 | 2 | 3 | 0
2015-09-26 | 1 | 2 | 2
2015-09-27 | 2 | 0 | 4
有些示例使用 对具有数值的变量进行计数with_bucket()
,但这并不适用于字符串值字段。
我尝试过嵌套选择WITH
,例如:
WITH
foo AS (
SELECT ...
),
bar AS (
SELECT ...
FROM foo
),
SELECT *
FROM bar;
和外部连接,但我无法破解这个。