我有一个包含几百万行的表。我在这张表上有一个表达式索引(我创建了两个方向以查看它是否有效果。
CREATE INDEX ON statuses (date_trunc('hour', created_at) ASC)
CREATE INDEX ON statuses (date_trunc('hour', created_at) DESC)
我正在尝试进行查询,使用 group by 收集每小时的状态计数,但仅针对今天(或过去 7 天)创建的状态。但是,尝试在某个日期之前删除所有条目不会使用索引,而是过滤所有行。但是,如果我删除大于并使用等于,则使用索引。我把输出放在EXPLAIN
下面。希望有人可以帮助我使此查询使用索引或至少提高性能,使其以毫秒而不是秒为单位。
使用 equals 正确使用索引:
=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) = '2013-02-06 00:00:00';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=132.48..29443.34 rows=1653 width=8) (actual time=4.362..4.363 rows=1 loops=1)
-> Bitmap Heap Scan on statuses (cost=132.48..29419.22 rows=18337 width=8) (actual time=0.209..2.159 rows=1319 loops=1)
Recheck Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone)
-> Bitmap Index Scan on statuses_date_trunc_idx1 (cost=0.00..131.57 rows=18337 width=0) (actual time=0.178..0.178 rows=1319 loops=1)
Index Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone)
Total runtime: 4.416 ms
(6 rows)
但是,一旦我使用大于(或小于),这将导致查询对没有索引的表进行过滤。
=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) > '2013-02-06 00:00:00';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=185386.54..185772.10 rows=110160 width=8) (actual time=2915.495..2915.774 rows=21 loops=1)
-> Seq Scan on statuses (cost=0.00..184164.06 rows=1222485 width=8) (actual time=1676.827..2869.748 rows=47070 loops=1)
Filter: (date_trunc('hour'::text, created_at) > '2013-02-06 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 3620426
Total runtime: 2916.049 ms
(5 rows)
在这种情况下,我可以通过在我想选择的区域内每小时使用IN
和列出来解决这个问题,但我真的很想弄清楚为什么索引没有用于大于查询?
=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) IN ('2013-02-06 00:00:00', '2013-02-06 01:00:00');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=51988.38..51999.94 rows=3305 width=8) (actual time=7.218..7.223 rows=2 loops=1)
-> Bitmap Heap Scan on statuses (cost=262.96..51951.70 rows=36675 width=8) (actual time=0.376..4.576 rows=2507 loops=1)
Recheck Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[]))
-> Bitmap Index Scan on statuses_date_trunc_idx1 (cost=0.00..261.13 rows=36675 width=0) (actual time=0.341..0.341 rows=2507 loops=1)
Index Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[]))
Total runtime: 7.305 ms
(6 rows)