1
SELECT COUNT(*)
FROM "businesses"
WHERE (businesses.postal_code_id IN
         (SELECT id
          FROM postal_codes
          WHERE lower(city) IN ('los angeles')
            AND lower(region) = 'california'))
  AND (EXISTS
         (SELECT *
          FROM categorizations c
          WHERE c.business_id=businesses.id
            AND c.category_id IN (86)))

我有一个 postgres 数据库业务、类别和位置。这个查询执行了 95665.9 毫秒,我很确定瓶颈在分类中。有没有更好的方法来执行这个?结果计数为 1032

=# EXPLAIN ANALYZE SELECT COUNT(*)
-# FROM "businesses"
-# WHERE (businesses.postal_code_id IN
(#          (SELECT id
(#           FROM postal_codes
(#           WHERE lower(city) IN ('los angeles')
(#             AND lower(region) = 'california'));
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4007.74..4007.75 rows=1 width=0) (actual time=263820.923..263820.924 rows=1 loops=1)
   ->  Nested Loop  (cost=41.93..4005.20 rows=1015 width=0) (actual time=469.716..263679.865 rows=112513 loops=1)
         ->  HashAggregate  (cost=15.59..15.60 rows=1 width=4) (actual time=332.664..332.946 rows=82 loops=1)
               ->  Bitmap Heap Scan on postal_codes  (cost=11.57..15.59 rows=1 width=4) (actual time=84.772..332.407 rows=82 loops=1)
                     Recheck Cond: ((lower((city)::text) = 'los angeles'::text) AND (lower((region)::text) = 'california'::text))
                     ->  BitmapAnd  (cost=11.57..11.57 rows=1 width=0) (actual time=77.530..77.530 rows=0 loops=1)
                           ->  Bitmap Index Scan on idx_postal_codes_lower_city  (cost=0.00..5.66 rows=187 width=0) (actual time=22.800..22.800 rows=82 loops=1)
                                 Index Cond: (lower((city)::text) = 'los angeles'::text)
                           ->  Bitmap Index Scan on idx_postal_codes_lower_region  (cost=0.00..5.66 rows=187 width=0) (actual time=54.714..54.714 rows=2356 loops=1)
                                 Index Cond: (lower((region)::text) = 'california'::text)
         ->  Bitmap Heap Scan on businesses  (cost=26.34..3976.91 rows=1015 width=4) (actual time=95.926..3208.426 rows=1372 loops=82)
               Recheck Cond: (postal_code_id = postal_codes.id)
               ->  Bitmap Index Scan on index_businesses_on_postal_code_id  (cost=0.00..26.08 rows=1015 width=0) (actual time=89.864..89.864 rows=1380 loops=82)
                     Index Cond: (postal_code_id = postal_codes.id)
 Total runtime: 263821.016 ms
(15 rows)

和加入版本:

EXPLAIN ANALYZE SELECT count(*) FROM businesses
LEFT JOIN postal_codes
ON businesses.postal_code_id = postal_codes.id
WHERE lower(postal_codes.city) = 'los angeles'
AND lower(postal_codes.region) = 'california';

-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Aggregate  (cost=4006.14..4006.15 rows=1 width=0) (actual time=143357.170..143357.171 rows=1 loops=1)
-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |   ->  Nested Loop  (cost=37.91..4005.19 rows=381 width=0) (actual time=138.666..143218.064 rows=112514 loops=1)
-[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         ->  Bitmap Heap Scan on postal_codes  (cost=11.57..15.59 rows=1 width=4) (actual time=0.559..33.957 rows=82 loops=1)
-[ RECORD 4 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               Recheck Cond: ((lower((city)::text) = 'los angeles'::text) AND (lower((region)::text) = 'california'::text))
-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               ->  BitmapAnd  (cost=11.57..11.57 rows=1 width=0) (actual time=0.532..0.532 rows=0 loops=1)
-[ RECORD 6 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                     ->  Bitmap Index Scan on idx_postal_codes_lower_city  (cost=0.00..5.66 rows=187 width=0) (actual time=0.058..0.058 rows=82 loops=1)
-[ RECORD 7 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                           Index Cond: (lower((city)::text) = 'los angeles'::text)
-[ RECORD 8 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                     ->  Bitmap Index Scan on idx_postal_codes_lower_region  (cost=0.00..5.66 rows=187 width=0) (actual time=0.461..0.461 rows=2356 loops=1)
-[ RECORD 9 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                           Index Cond: (lower((region)::text) = 'california'::text)
-[ RECORD 10 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         ->  Bitmap Heap Scan on businesses  (cost=26.34..3976.91 rows=1015 width=4) (actual time=55.493..1742.407 rows=1372 loops=82)
-[ RECORD 11 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               Recheck Cond: (postal_code_id = postal_codes.id)
-[ RECORD 12 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               ->  Bitmap Index Scan on index_businesses_on_postal_code_id  (cost=0.00..26.09 rows=1015 width=0) (actual time=53.141..53.141 rows=1381 loops=82)
-[ RECORD 13 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                     Index Cond: (postal_code_id = postal_codes.id)
-[ RECORD 14 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Total runtime: 143357.260 ms

简化查询的结果要大得多,但鉴于有索引而且我只做一个连接,我很惊讶它需要这么长时间

4

1 回答 1

2

尝试在列城市上使用功能索引

在邮政编码上创建索引((下(城市)))

列 city 和 region 之间存在很强的依赖性,因此有时您必须将这些预测分开以获得更好的规划器预测准确性。如果您需要更好的预测,则需要将列 lower_city 和 lower_region 添加到表 postal_codes - PostgreSQL 没有索引统计信息。

将执行计划发送到此处 - 通过http://explain.depesz.com/ - 如果可能,结果解释分析您的查询

9.1 应该自动将相关子查询转换为半连接,但我不确定。尝试将您的查询从子查询重写为仅 INNER JOIN 形式(可能没有帮助,但也许)。

于 2013-07-17T05:21:37.117 回答