sql - 查询计划：JOINS的顺序重要吗

Question

我想检查 JOINS 的顺序在 SQL 查询中是否对运行时和效率很重要。

我正在使用 PostgreSQL，为了进行检查，我使用了来自 MYSQL ( https://downloads.mysql.com/docs/world.sql.zip ) 的示例 world db 并编写了以下两个语句：

查询1：

EXPLAIN ANALYSE SELECT * FROM countrylanguage
    JOIN city ON city.countrycode = countrylanguage.countrycode
    JOIN country c ON city.countrycode = c.code

查询2：

EXPLAIN ANALYSE SELECT * FROM city
    JOIN country c ON c.code = city.countrycode
    JOIN countrylanguage c2 on c.code = c2.countrycode

查询计划一：

Hash Join  (cost=41.14..484.78 rows=29946 width=161) (actual time=1.472..17.602 rows=30670 loops=1)
  Hash Cond: (city.countrycode = countrylanguage.countrycode)
  ->  Seq Scan on city  (cost=0.00..72.79 rows=4079 width=31) (actual time=0.062..1.220 rows=4079 loops=1)
  ->  Hash  (cost=28.84..28.84 rows=984 width=130) (actual time=1.378..1.378 rows=984 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 172kB
        ->  Hash Join  (cost=10.38..28.84 rows=984 width=130) (actual time=0.267..0.823 rows=984 loops=1)
              Hash Cond: (countrylanguage.countrycode = c.code)
              ->  Seq Scan on countrylanguage  (cost=0.00..15.84 rows=984 width=17) (actual time=0.029..0.158 rows=984 loops=1)
              ->  Hash  (cost=7.39..7.39 rows=239 width=113) (actual time=0.220..0.220 rows=239 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 44kB
                    ->  Seq Scan on country c  (cost=0.00..7.39 rows=239 width=113) (actual time=0.013..0.137 rows=239 loops=1)
Planning Time: 3.818 ms
Execution Time: 18.801 ms

查询计划2：

Hash Join  (cost=41.14..312.47 rows=16794 width=161) (actual time=2.415..18.628 rows=30670 loops=1)
  Hash Cond: (city.countrycode = c.code)
  ->  Seq Scan on city  (cost=0.00..72.79 rows=4079 width=31) (actual time=0.032..0.574 rows=4079 loops=1)
  ->  Hash  (cost=28.84..28.84 rows=984 width=130) (actual time=2.364..2.364 rows=984 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 171kB
        ->  Hash Join  (cost=10.38..28.84 rows=984 width=130) (actual time=0.207..1.307 rows=984 loops=1)
              Hash Cond: (c2.countrycode = c.code)
              ->  Seq Scan on countrylanguage c2  (cost=0.00..15.84 rows=984 width=17) (actual time=0.027..0.204 rows=984 loops=1)
              ->  Hash  (cost=7.39..7.39 rows=239 width=113) (actual time=0.163..0.163 rows=239 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 44kB
                    ->  Seq Scan on country c  (cost=0.00..7.39 rows=239 width=113) (actual time=0.015..0.049 rows=239 loops=1)
Planning Time: 1.901 ms
Execution Time: 19.694 ms

估计的成本和行不同，最后的哈希条件不同。这是否意味着查询规划器没有对两个查询做同样的事情，还是我走错了路？

谢谢你的帮助！

score 2 · Accepted Answer

问题不在于joins 的顺序，而是join条件不同——指的是不同的表。

在第一个查询中，您将加入countrylanguage使用来自的国家代码city。在第二个中，您使用的是来自的国家/地区代码country。

对于内部连接，这不应该对最终结果产生影响。但是，它显然会影响优化器如何考虑不同的路径。

score 2 · Accepted Answer

（如前所述）查询不相同
虽然不完全相同，但计划具有可比性
两个查询都在 18 毫秒内执行，比较它们几乎是无用的
对结构不足（键、索引、统计信息）但占用空间足够小（work_mem）的表进行查询将始终导致哈希连接。

sql - 查询计划：JOINS的顺序重要吗

2 回答 2

Related

Reference