postgresql - 这是在 Postgres 中创建部分索引的正确方法吗？

Question

我们有一个包含 400 万条记录的表，对于一个特定的常用用例，我们只对特定 salesforce userType 为“标准”的记录感兴趣，这些记录在 400 万条中只有大约 10,000 条。可能存在的其他用户类型是“PowerPartner”、“CSPLitePortal”、“CustomerSuccess”、“PowerCustomerSuccess”和“CsnOnly”。

因此，对于这个用例，我认为根据文档创建部分索引会更好。

所以我打算创建这个部分索引来加快对用户类型为“标准”的记录的查询，并防止来自网络的请求超时：

CREATE INDEX user_type_idx ON user_table(userType)
WHERE userType NOT IN
   ('PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess', 'CsnOnly');

查找查询将是

select * from user_table where userType='Standard';

您能否确认这是否是创建部分索引的正确方法？这会有很大帮助。

score 4 · Accepted Answer

Postgres 可以使用它，但它的使用方式（略）低于指定索引的where user_type = 'Standard'.

我创建了一个包含 400 万行的小型测试表，其中 10.000 行具有 user_type 'Standard'。其他值使用以下脚本随机分布：

create table user_table
(
  id serial primary key,
  some_date date not null,
  user_type text not null,
  some_ts timestamp not null, 
  some_number integer not null, 
  some_data text, 
  some_flag boolean
);

insert into user_table (some_date, user_type, some_ts, some_number, some_data, some_flag)
select current_date, 
       case (random() * 4 + 1)::int
         when 1 then 'PowerPartner'
         when 2 then 'CSPLitePortal'
         when 3 then 'CustomerSuccess'
         when 4 then 'PowerCustomerSuccess'
         when 5 then 'CsnOnly'
       end, 
       clock_timestamp(),
       42, 
       rpad(md5(random()::text), (random() * 200 + 1)::int, md5(random()::text)), 
       (random() + 1)::int = 1
from generate_series(1,4e6 - 10000) as t(i)
union all 
select current_date, 
       'Standard',
       clock_timestamp(),
       42, 
       rpad(md5(random()::text), (random() * 200 + 1)::int, md5(random()::text)), 
       (random() + 1)::int = 1
from generate_series(1,10000) as t(i);

（我创建的表格不仅仅是几列，因为规划者的选择也受表格的大小和宽度驱动）

使用 NOT IN 的索引的第一个测试：

create index ix_not_in on user_table(user_type) 
where user_type not in ('PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess', 'CsnOnly');

explain (analyze true, verbose true, buffers true) 
select *
from user_table
where user_type = 'Standard'

结果是：

QUERY PLAN                                                                                                                      
--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on stuff.user_table  (cost=139.68..14631.83 rows=11598 width=139) (actual time=1.035..2.171 rows=10000 loops=1)
  Output: id, some_date, user_type, some_ts, some_number, some_data, some_flag                                                  
  Recheck Cond: (user_table.user_type = 'Standard'::text)                                                                       
  Buffers: shared hit=262                                                                                                       
  ->  Bitmap Index Scan on ix_not_in  (cost=0.00..136.79 rows=11598 width=0) (actual time=1.007..1.007 rows=10000 loops=1)      
        Index Cond: (user_table.user_type = 'Standard'::text)                                                                   
        Buffers: shared hit=40                                                                                                  
Total runtime: 2.506 ms

（以上是我运行语句大约 10 次以消除缓存问题后的典型执行时间）

如您所见，规划器使用位图索引扫描，这是一种“有损”扫描，需要额外的步骤来过滤掉误报。

使用以下索引时：

create index ix_standard on user_table(id) 
where user_type = 'Standard';

这导致了以下计划：

QUERY PLAN                                                                                                                              
----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_standard on stuff.user_table  (cost=0.29..443.16 rows=10267 width=139) (actual time=0.011..1.498 rows=10000 loops=1)
  Output: id, some_date, user_type, some_ts, some_number, some_data, some_flag                                                          
  Buffers: shared hit=313                                                                                                               
Total runtime: 1.815 ms

结论：

使用了您的索引，但仅针对您感兴趣的类型的索引效率更高。

运行时并没有太大的不同。我对每个解释执行了大约 10 次，ix_standard指数的平均值略低于 2 毫秒，而ix_not_in指数的平均值略高于 2 毫秒 - 所以不是真正的性能差异。

但总的来说，随着表大小的增加，索引扫描的扩展性比位图索引扫描的扩展性更好。这基本上是由于“重新检查条件” - 特别是如果没有足够的 work_mem 可用于将位图保存在内存中（对于较大的表）。

score 1 · Accepted Answer

对于要使用的索引，WHERE条件必须在您编写的查询中使用。

PostgreSQL 有一定的推理能力，但无法推断出userType = 'Standard'与索引中的条件等价。

用于EXPLAIN确定您的索引是否可以使用。

postgresql - 这是在 Postgres 中创建部分索引的正确方法吗？

2 回答 2

Related

Reference