我们有一个生产 POSTGRES 9.6 db,有大约 1 亿条记录(LOYALTY)和新表(截至目前<几千)“INFO”
基表(用 django 编写)
class Loyalty(models.Model):
customer = models.ForeignKey(Customer, db_index=True)
order = models.ForeignKey(Order, null=True) # i.e. no index!
class Info(models.Model):
loyalty_adjustment = models.OneToOneField(Loyalty)
order_number = models.CharField(max_length=50, db_index=True)
...
问题 1:
explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT
*
FROM
"loyalty"
LEFT OUTER JOIN "info" ON ("loyalty"."id" = "info"."loyalty_adjustment_id")
WHERE ("info"."order_number" = '21072621527905'
OR "loyalty"."order_id" = 694781500)
LIMIT 21
这会产生一个缓慢的查询:(全扫描)
Limit (cost=19.23..120.18 rows=21 width=220) (actual time=53814.148..77814.842 rows=1 loops=1)"
-> Hash Left Join (cost=19.23..2858123.35 rows=594498 width=220) (actual time=53814.147..77814.840 rows=1 loops=1)"
Hash Cond: (loyalty.id = info.info_id)"
Filter: (((info.order_number)::text = '21072621527905'::text) OR (loyalty.order_id = 694781500))"
Rows Removed by Filter: 118934642"
-> Seq Scan on loyalty (cost=0.00..2412225.44 rows=118899344 width=50) (actual time=1.001..59578.218 rows=118934643 loops=1)"
-> Hash (cost=14.10..14.10 rows=410 width=170) (actual time=0.508..0.508 rows=4 loops=1)"
Buckets: 1024 Batches: 1 Memory Usage: 9kB"
-> Seq Scan on info (cost=0.00..14.10 rows=410 width=170) (actual time=0.500..0.500 rows=4 loops=1)"
Planning time: 1.185 ms"
Execution time: 77814.890 ms"
在没有 OR 子句的情况下将查询分成 2 个,使其更快 < 1 秒
问题 2:
explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT
*
FROM
"loyalty"
LEFT OUTER JOIN "info" ON ("loyalty"."id" = "info"."loyalty_adjustment_id")
WHERE
"info"."order_number" = '21072620001657'
LIMIT 21
和
问题 3:
explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT
*
FROM
"loyalty"
LEFT OUTER JOIN "info" ON ("loyalty"."id" = "info"."loyalty_adjustment_id")
WHERE ("info"."order_number" = '21072620001657'
AND "loyalty"."order_id" = 4967472)
LIMIT 21
为什么使用 OR 子句使其比使用联合的两个单独查询慢得多?它是否与在 BOTH 表上有条件有关?
为什么它在 QUERY 3 上运行 INDEX 扫描?因为 LOYALTY 表没有 ORDER 作为其索引
QUERY 2,即使它进行 INDEX 扫描,如果条件确实说明了索引值是什么(这是单个 order_number?),为什么它没有做比全索引扫描更好的事情?