想象一个如下所示的帐户表:
Column | Type | Modifiers
------------+-----------------------------+-----------
id | bigint | not null
signupdate | timestamp without time zone | not null
canceldate | timestamp without time zone |
我想按月获取注册和取消数量的报告。
在两个查询中执行此操作非常简单,一个用于按月注册,另一个用于按月取消。有没有一种有效的方法可以在单个查询中完成?某些月份的注册和取消可能为零,结果应该显示为零。
使用这样的源数据:
id signupDate cancelDate
1 2012-01-13
2 2012-01-15 2012-02-05
3 2012-03-01 2012-03-20
我们应该得到以下结果:
Date signups cancellations
2012-01 2 0
2012-02 0 1
2012-03 1 1
我正在使用 postgresql 9.0
第一个答案后更新:
Craig Ringer 在下面提供了一个很好的答案。在我的大约 75k 条记录的数据集中,第一个和第三个示例的表现相似。第二个示例似乎在某处有错误,它返回了不正确的结果。
查看解释分析的结果(我的表确实在 signup_date 上有一个索引),第一个查询返回:
Sort (cost=2086062.39..2086062.89 rows=200 width=24) (actual time=863.831..863.833 rows=20 loops=1)
Sort Key: m.m
Sort Method: quicksort Memory: 26kB
InitPlan 2 (returns $1)
-> Result (cost=0.12..0.13 rows=1 width=0) (actual time=0.063..0.064 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.12 rows=1 width=8) (actual time=0.040..0.040 rows=1 loops=1)
-> Index Scan using account_created_idx on account (cost=0.00..8986.92 rows=75759 width=8) (actual time=0.039..0.039 rows=1 loops=1)
Index Cond: (created IS NOT NULL)
InitPlan 3 (returns $2)
-> Aggregate (cost=2991.39..2991.40 rows=1 width=16) (actual time=37.108..37.108 rows=1 loops=1)
-> Seq Scan on account (cost=0.00..2612.59 rows=75759 width=16) (actual time=0.008..14.102 rows=75759 loops=1)
-> HashAggregate (cost=2083057.21..2083063.21 rows=200 width=24) (actual time=863.801..863.806 rows=20 loops=1)
-> Nested Loop (cost=0.00..2077389.49 rows=755696 width=24) (actual time=37.238..805.333 rows=94685 loops=1)
Join Filter: ((date_trunc('month'::text, a.created) = m.m) OR (date_trunc('month'::text, a.terminateddate) = m.m))
-> Function Scan on generate_series m (cost=0.00..10.00 rows=1000 width=8) (actual time=37.193..37.197 rows=20 loops=1)
-> Materialize (cost=0.00..3361.39 rows=75759 width=16) (actual time=0.004..11.916 rows=75759 loops=20)
-> Seq Scan on account a (cost=0.00..2612.59 rows=75759 width=16) (actual time=0.003..24.019 rows=75759 loops=1)
Total runtime: 872.183 ms
第三个查询返回:
Sort (cost=1199951.68..1199952.18 rows=200 width=8) (actual time=732.354..732.355 rows=20 loops=1)
Sort Key: m.m
Sort Method: quicksort Memory: 26kB
InitPlan 4 (returns $2)
-> Result (cost=0.12..0.13 rows=1 width=0) (actual time=0.030..0.030 rows=1 loops=1)
InitPlan 3 (returns $1)
-> Limit (cost=0.00..0.12 rows=1 width=8) (actual time=0.022..0.022 rows=1 loops=1)
-> Index Scan using account_created_idx on account (cost=0.00..8986.92 rows=75759 width=8) (actual time=0.022..0.022 rows=1 loops=1)
Index Cond: (created IS NOT NULL)
InitPlan 5 (returns $3)
-> Aggregate (cost=2991.39..2991.40 rows=1 width=16) (actual time=30.212..30.212 rows=1 loops=1)
-> Seq Scan on account (cost=0.00..2612.59 rows=75759 width=16) (actual time=0.004..8.276 rows=75759 loops=1)
-> HashAggregate (cost=12.50..1196952.50 rows=200 width=8) (actual time=65.226..732.321 rows=20 loops=1)
-> Function Scan on generate_series m (cost=0.00..10.00 rows=1000 width=8) (actual time=30.262..30.264 rows=20 loops=1)
SubPlan 1
-> Aggregate (cost=2992.34..2992.35 rows=1 width=8) (actual time=21.098..21.098 rows=1 loops=20)
-> Seq Scan on account (cost=0.00..2991.39 rows=379 width=8) (actual time=0.265..20.720 rows=3788 loops=20)
Filter: (date_trunc('month'::text, created) = $0)
SubPlan 2
-> Aggregate (cost=2992.34..2992.35 rows=1 width=8) (actual time=13.994..13.994 rows=1 loops=20)
-> Seq Scan on account (cost=0.00..2991.39 rows=379 width=8) (actual time=2.363..13.887 rows=998 loops=20)
Filter: (date_trunc('month'::text, terminateddate) = $0)
Total runtime: 732.487 ms
这无疑使第三个查询看起来更快,但是当我使用“时间”命令从命令行运行查询时,第一个查询始终更快,尽管只有几毫秒。
令我惊讶的是,运行两个单独的查询(一个用于计算注册,一个用于计算取消)的速度要快得多。运行时间不到一半,~300ms vs ~730ms。当然,这留下了更多的工作要在外部完成,但就我的目的而言,它仍然可能是最好的解决方案。以下是单个查询:
select
m,
count(a.id) as "signups"
from
generate_series(
(SELECT date_trunc('month',min(signup_date)) FROM accounts),
(SELECT date_trunc('month',greatest(max(signup_date),max(cancel_date))) FROM accounts),
interval '1 month') as m
INNER JOIN accounts a ON (date_trunc('month',a.signup_date) = m)
group by m
order by m
;
select
m,
count(a.id) as "cancellations"
from
generate_series(
(SELECT date_trunc('month',min(signup_date)) FROM accounts),
(SELECT date_trunc('month',greatest(max(signup_date),max(cancel_date))) FROM accounts),
interval '1 month') as m
INNER JOIN accounts a ON (date_trunc('month',a.cancel_date) = m)
group by m
order by m
;
我已将克雷格的答案标记为正确,但如果你能让它更快,我很想听听