sql - Why does Postgres do a sequential scan where the index would return < 1% of the data?

Question

I have 19 years of Oracle and MySQL experience (DBA and dev) and I am new to Postgres, so I may be missing something obvious. But I can not get this query to do what I want.

NOTE: This query is running on an EngineYard Postgres instance. I am not immediately aware of the parameters it has set up. Also, columns applicable_type and status in the items table are of extension type citext.

The following query can take in excess of 60 seconds to return rows:

SELECT items.item_id, 
       CASE when items.sku is null then items.title else concat(item.title, ' (SKU: ', items.sku, ')') END title, 
       items.listing_status, items.updated_at, items.id, 
       items.sku, count(details.id) detail_count 
FROM "items" LEFT OUTER JOIN details ON details.applicable_id = items.id 
                                    and details.applicable_type = 'Item' 
                                    and details.status = 'Valid' 
                LEFT OUTER JOIN products ON products.id = items.product_id
WHERE "items"."user_id" = 3
GROUP BY items.id
ORDER BY title asc
LIMIT 25 OFFSET 0

The details table contains 6.5M rows. The LEFT OUTER JOIN to it does a sequential scan on applicable_id. Cardinality-wise, that column has 120K distinct possibilities across 6.5M rows.

I have a btree index on details with the following columns:

applicable_id
applicable_type
status

but really, applicable_id and applicable_type have low cardinality.

My explain analyze looks like this:

Limit  (cost=247701.59..247701.65 rows=25 width=118) (actual time=28781.090..28781.098 rows=25 loops=1)
  ->  Sort  (cost=247701.59..247703.05 rows=585 width=118) (actual time=28781.087..28781.090 rows=25 loops=1)
      Sort Key: (CASE WHEN (items.sku IS NULL) THEN (items.title)::text ELSE pg_catalog.concat(items.title, ' (SKU: ', items.sku, ')') END)
      Sort Method: top-N heapsort  Memory: 30kB
      ->  HashAggregate  (cost=247677.77..247685.08 rows=585 width=118) (actual time=28779.658..28779.974 rows=664 loops=1)
          ->  Hash Right Join  (cost=2069.47..247645.64 rows=6425 width=118) (actual time=17798.898..28742.395 rows=60047 loops=1)
                Hash Cond: (details.applicable_id = items.id)
                ->  Seq Scan on details  (cost=0.00..220591.65 rows=6645404 width=8) (actual time=6.272..27702.717 rows=6646205 loops=1)
                      Filter: ((applicable_type = 'Listing'::citext) AND (status = 'Valid'::citext))
                      Rows Removed by Filter: 942
                ->  Hash  (cost=2062.16..2062.16 rows=585 width=118) (actual time=1.286..1.286 rows=664 loops=1)
                      Buckets: 1024  Batches: 1  Memory Usage: 90kB
                      ->  Bitmap Heap Scan on items  (cost=16.87..2062.16 rows=585 width=118) (actual time=0.157..0.748 rows=664 loops=1)
                            Recheck Cond: (user_id = 3)
                            ->  Bitmap Index Scan on index_items_on_user_id  (cost=0.00..16.73 rows=585 width=0) (actual time=0.141..0.141 rows=664 loops=1)
                                  Index Cond: (user_id = 3)

Total runtime: 28781.238 ms

score 3 · Accepted Answer

你有产生标题的表达式的索引吗？更好的是，一个在（user_id，title_expression）上。

如果不是，那可能是一个很好的补充，以便通过索引扫描的前 25 行嵌套循环，看到 Postgres 无法合理地猜测哪个随机 25 行（因此您当前正在进行的 seq 扫描连接表）将是必需的。

score 1 · Accepted Answer

我认为您只需要在appliable_id 列上建立索引（没有appliable_type、status 列）。您可能还需要增加default_statistics_target参数（系统范围或仅适用于 apply_id 列），因此 postgresql 最好猜测加入的行数。

sql - Why does Postgres do a sequential scan where the index would return < 1% of the data?

2 回答 2

Related

Reference