0

我在 postgresql 中创建了一个包含复合主键(3 列)的表。如果在查询中使用不包含前导列的子集,则默认索引不会被使用。如果我们显式创建索引(索引将用于任何子集),情况并非如此。

默认情况下,postgres 会在主键上创建一个索引。但正如postgres 文件所说

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.

如果查询不包括前导列,那么也将使用索引(如果我们显式创建索引),但是当我们尝试使用默认主键索引的子集时,不会使用索引。

以下是不适用于子集的架构和查询。

# \d client_data
              Table "public.client_data"
       Column       |         Type          | Modifiers 
--------------------+-----------------------+-----------
 macaddr            | character varying(64) | not null
 ts                 | bigint                | not null
 interval           | smallint              | not null
 snr                | smallint              | not null
 rx_rate            | bigint                | 
 tx_rate            | bigint                | 
 rx_data            | bigint                | 
 tx_data            | bigint                | 

Indexes:
    "client_data_pkey" PRIMARY KEY, btree (macaddr, ts, interval)

如果我们指定所有主键列,那么查询计划器将使用索引

# explain analyze select count(*) from client_data where macaddr='a:b:c' and ts=346783556 and interval=5;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.60..8.61 rows=1 width=0) (actual time=0.040..0.041 rows=1 loops=1)
   ->  Index Scan using client_data_pkey on client_data  (cost=0.00..8.59 rows=1 width=0) (actual time=0.037..0.037 rows=0 loops=1)
         Index Cond: (((macaddr)::text = 'a:b:c'::text) AND (ts = 346783556) AND ("interval" = 5))
 Total runtime: 0.096 ms
(4 rows)

但是如果我们指定子集,查询规划器将不会使用索引

# explain analyze select count(*) from client_data where ts=346783556;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16176.01..16176.02 rows=1 width=0) (actual time=78.937..78.938 rows=1 loops=1)
   ->  Seq Scan on client_data  (cost=0.00..16175.92 rows=36 width=0) (actual time=78.932..78.932 rows=0 loops=1)
         Filter: (ts = 346783556)
 Total runtime: 78.975 ms
(4 rows)


# explain analyze select count(*) from client_data where ts=346783556 and interval=5;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17639.11..17639.12 rows=1 width=0) (actual time=78.815..78.815 rows=1 loops=1)
   ->  Seq Scan on client_data  (cost=0.00..17639.11 rows=1 width=0) (actual time=78.810..78.810 rows=0 loops=1)
         Filter: ((ts = 346783556) AND ("interval" = 5))
 Total runtime: 78.853 ms
(4 rows)

但是,如果我们使用带有 ts 或间隔的前导列(macaddr),则将使用索引。

# explain analyze select count(*) from client_data where macaddr='a' and ts=346783556;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.59..8.60 rows=1 width=0) (actual time=0.055..0.056 rows=1 loops=1)
   ->  Index Scan using client_data_pkey on client_data  (cost=0.00..8.59 rows=1 width=0) (actual time=0.051..0.051 rows=0 loops=1)
         Index Cond: (((macaddr)::text = 'a'::text) AND (ts = 346783556))
 Total runtime: 0.103 ms
(4 rows)


# explain analyze select count(*) from client_data where macaddr='a' and interval=56;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=56.15..56.16 rows=1 width=0) (actual time=0.021..0.022 rows=1 loops=1)
   ->  Index Scan using client_data_pkey on client_data  (cost=0.00..56.15 rows=1 width=0) (actual time=0.017..0.017 rows=0 loops=1)
         Index Cond: (((macaddr)::text = 'a'::text) AND ("interval" = 56))
 Total runtime: 0.055 ms
(4 rows)
4

1 回答 1

2

你应该在你引用的内容之后阅读其余的文本。

PostgreSQL 只能有效地将 b-tree 索引用于包含最左侧列的搜索。(a,b)您可以对搜索a或同时搜索aand的查询使用索引b,但不能对仅搜索 的查询使用索引b。这是因为多列 b-tree 索引的结构方式 - 无论如何都必须扫描大多数索引,因此 PostgreSQL 只进行全表扫描通常更有效。

如果您需要将它们作为离散列处理,并且如果您需要在 上进行大量搜索/快速搜索,也可以在 上b创建一个单独的索引b

您可能会发现,如果您SET enable_seqscan = off(仅将其用于测试目的)PostgreSQL 会将您的索引用于非最左侧的列,但它可能会比 seqscan 慢。如果不是,您需要查看您的random_page_costseq_page_cost设置是否符合现实。

于 2013-11-11T06:42:27.097 回答