postgresql - PostgreSQL group by date_trunc 聚合索引和大于不使用索引

Question

我有一个包含几百万行的表。我在这张表上有一个表达式索引（我创建了两个方向以查看它是否有效果。

CREATE INDEX ON statuses (date_trunc('hour', created_at) ASC)
CREATE INDEX ON statuses (date_trunc('hour', created_at) DESC)

我正在尝试进行查询，使用 group by 收集每小时的状态计数，但仅针对今天（或过去 7 天）创建的状态。但是，尝试在某个日期之前删除所有条目不会使用索引，而是过滤所有行。但是，如果我删除大于并使用等于，则使用索引。我把输出放在EXPLAIN下面。希望有人可以帮助我使此查询使用索引或至少提高性能，使其以毫秒而不是秒为单位。

使用 equals 正确使用索引：

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) = '2013-02-06 00:00:00';
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=132.48..29443.34 rows=1653 width=8) (actual time=4.362..4.363 rows=1 loops=1)
   ->  Bitmap Heap Scan on statuses  (cost=132.48..29419.22 rows=18337 width=8) (actual time=0.209..2.159 rows=1319 loops=1)
         Recheck Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone)
         ->  Bitmap Index Scan on statuses_date_trunc_idx1  (cost=0.00..131.57 rows=18337 width=0) (actual time=0.178..0.178 rows=1319 loops=1)
               Index Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone)
 Total runtime: 4.416 ms
(6 rows)

但是，一旦我使用大于（或小于），这将导致查询对没有索引的表进行过滤。

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) > '2013-02-06 00:00:00';
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=185386.54..185772.10 rows=110160 width=8) (actual time=2915.495..2915.774 rows=21 loops=1)
   ->  Seq Scan on statuses  (cost=0.00..184164.06 rows=1222485 width=8) (actual time=1676.827..2869.748 rows=47070 loops=1)
         Filter: (date_trunc('hour'::text, created_at) > '2013-02-06 00:00:00'::timestamp without time zone)
         Rows Removed by Filter: 3620426
 Total runtime: 2916.049 ms
(5 rows)

在这种情况下，我可以通过在我想选择的区域内每小时使用IN和列出来解决这个问题，但我真的很想弄清楚为什么索引没有用于大于查询？

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) IN ('2013-02-06 00:00:00', '2013-02-06 01:00:00');
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=51988.38..51999.94 rows=3305 width=8) (actual time=7.218..7.223 rows=2 loops=1)
   ->  Bitmap Heap Scan on statuses  (cost=262.96..51951.70 rows=36675 width=8) (actual time=0.376..4.576 rows=2507 loops=1)
         Recheck Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[]))
         ->  Bitmap Index Scan on statuses_date_trunc_idx1  (cost=0.00..261.13 rows=36675 width=0) (actual time=0.341..0.341 rows=2507 loops=1)
               Index Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[]))
 Total runtime: 7.305 ms
(6 rows)

score 1 · Accepted Answer

statuses该表的估计值是“坏”查询返回的实际行数的26 倍。

尝试运行VACUUM ANALYZE statuses;
如果运气不好，增加该statuses.created_at列的统计目标ALTER TABLE statuses ALTER created_at SET STATISTICS 500;并再次分析。

这应该会有所帮助。

编辑：您需要检查您的autovacuum设置。

阅读这部分手册并检查您的配置，如下所示：

SELECT name,setting,source FROM pg_settings WHERE name ~ 'autovacuum';

如果您的表格太大，您可以调整autovacuum_analyze_threshold和/或autovacuum_analyze_scale_factor使用ALTER TABLE tab SET (storage_parameter = ...)语法。

postgresql - PostgreSQL group by date_trunc 聚合索引和大于不使用索引

1 回答 1

Related

Reference