0

我们有一个生产 POSTGRES 9.6 db,有大约 1 亿条记录(LOYALTY)和新表(截至目前<几千)“INFO”

基表(用 django 编写)

class Loyalty(models.Model):
   customer = models.ForeignKey(Customer, db_index=True)
   order = models.ForeignKey(Order, null=True) # i.e. no index!

class Info(models.Model):
    loyalty_adjustment = models.OneToOneField(Loyalty)
    order_number = models.CharField(max_length=50, db_index=True)
    ...

问题 1:

explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT
    *
FROM
    "loyalty"
    LEFT OUTER JOIN "info" ON ("loyalty"."id" = "info"."loyalty_adjustment_id")
WHERE ("info"."order_number" = '21072621527905'
    OR "loyalty"."order_id" = 694781500)
LIMIT 21

这会产生一个缓慢的查询:(全扫描)

Limit  (cost=19.23..120.18 rows=21 width=220) (actual time=53814.148..77814.842 rows=1 loops=1)"
  ->  Hash Left Join  (cost=19.23..2858123.35 rows=594498 width=220) (actual time=53814.147..77814.840 rows=1 loops=1)"
        Hash Cond: (loyalty.id = info.info_id)"
        Filter: (((info.order_number)::text = '21072621527905'::text) OR (loyalty.order_id = 694781500))"
        Rows Removed by Filter: 118934642"
        ->  Seq Scan on loyalty  (cost=0.00..2412225.44 rows=118899344 width=50) (actual time=1.001..59578.218 rows=118934643 loops=1)"
        ->  Hash  (cost=14.10..14.10 rows=410 width=170) (actual time=0.508..0.508 rows=4 loops=1)"
              Buckets: 1024  Batches: 1  Memory Usage: 9kB"
              ->  Seq Scan on info  (cost=0.00..14.10 rows=410 width=170) (actual time=0.500..0.500 rows=4 loops=1)"
Planning time: 1.185 ms"
Execution time: 77814.890 ms"

在没有 OR 子句的情况下将查询分成 2 个,使其更快 < 1 秒

问题 2:

explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT
    *
FROM
    "loyalty"
    LEFT OUTER JOIN "info" ON ("loyalty"."id" = "info"."loyalty_adjustment_id")
WHERE
    "info"."order_number" = '21072620001657'
LIMIT 21

在此处输入图像描述

问题 3:

explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT
    *
FROM
    "loyalty"
    LEFT OUTER JOIN "info" ON ("loyalty"."id" = "info"."loyalty_adjustment_id")
WHERE ("info"."order_number" = '21072620001657'
    AND "loyalty"."order_id" = 4967472)
LIMIT 21

在此处输入图像描述

  1. 为什么使用 OR 子句使其比使用联合的两个单独查询慢得多?它是否与在 BOTH 表上有条件有关?

  2. 为什么它在 QUERY 3 上运行 INDEX 扫描?因为 LOYALTY 表没有 ORDER 作为其索引

  3. QUERY 2,即使它进行 INDEX 扫描,如果条件确实说明了索引值是什么(这是单个 order_number?),为什么它没有做比全索引扫描更好的事情?

4

0 回答 0