3

We have queries of the form

select sum(acol)
where xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol)

What index can be built to speed up the where clause ?

A btree index created using

create index idx_01 using btree(xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol))

does not seem to be used at all.

EDIT

Setting enable_seqscan to off, the query using xpath_exists is much faster (one order of magnitude) and clearly shows using the corresponding index (the btree index built with xpath_exists).

Any clue why PostgreSQL would not be using the index and attempt a much slower sequential scan ?

Since I do not want to disable sequential scanning globally, I am back to square one and I am happily welcoming suggestions.

EDIT 2 - Explain plans

See below - Cost of first plan (seqscan off) is slightly higher but processing time much faster

b2box=# set enable_seqscan=off;
SET
b2box=# explain analyze
Select count(*) 
from B2HEAD.item
where cluster = 'B2BOX' and (  ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) )  )  offset 0 limit 1;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.042..606.042 rows=1 loops=1)
   ->  Aggregate  (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.039..606.039 rows=1 loops=1)
         ->  Bitmap Heap Scan on item  (cost=1058.65..22701.38 rows=26102 width=0) (actual time=3.290..603.823 rows=4085 loops=1)
               Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
               ->  Bitmap Index Scan on item_counter_01  (cost=0.00..1052.13 rows=56515 width=0) (actual time=2.283..2.283 rows=4085 loops=1)
                     Index Cond: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) = true)
 Total runtime: 606.136 ms
(7 rows)

plan on explain.depesz.com

b2box=# set enable_seqscan=on;
SET
b2box=# explain analyze
Select count(*) 
from B2HEAD.item
where cluster = 'B2BOX' and (  ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) )  )  offset 0 limit 1;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.163..10864.163 rows=1 loops=1)
   ->  Aggregate  (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.160..10864.160 rows=1 loops=1)
         ->  Seq Scan on item  (cost=0.00..22490.45 rows=26102 width=0) (actual time=33.574..10861.672 rows=4085 loops=1)
               Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
               Rows Removed by Filter: 108945
 Total runtime: 10864.242 ms
(6 rows)

plan on explain.depesz.com

4

1 回答 1

4

计划员成本参数

第一个计划(seqscan off)的成本略高,但处理时间要快得多

这告诉我你的random_page_costseq_page_cost可能是错的。您可能使用具有快速随机 I/O 的存储 - 要么是因为大多数数据库都缓存在 RAM 中,要么是因为您使用的是 SSD、带缓存的 SAN 或其他随机 I/O 本身就很快的存储。

尝试:

SET random_page_cost = 1;
SET seq_page_cost = 1.1;

大大减少成本参数差异然后重新运行。如果这样做可以考虑更改postgresql.conf..

您的行数估计值是合理的,因此它看起来不像是规划器错误估计问题或表统计信息错误的问题。

查询不正确

您的查询也不正确。OFFSET 0 LIMIT 1没有 anORDER BY会产生不可预知的结果,除非你保证只有一个匹配,在这种情况下,这些OFFSET ... LIMIT ...子句是不必要的,可以完全删除。

你通常最好尽可能地表达这样的SELECT max(...)查询SELECT min(...);PostgreSQL 将倾向于能够使用索引来获取所需的值,而无需进行昂贵的表扫描或索引扫描和排序。

提示

顺便说一句,对于未来的问题,PostgreSQL wiki 在性能类别中有一些很好的信息,并提供了询问慢查询问题的指南。

于 2013-04-19T01:24:17.233 回答