我有两张表,一张用于个人资料,一张用于个人资料的就业状况。这两个表具有一对一的关系。一份个人资料可能没有工作状态。表模式如下(为清楚起见,删除了不相关的列):
create type employment_status as enum ('claimed', 'approved', 'denied');
create table if not exists profiles
(
id bigserial not null
constraint profiles_pkey
primary key
);
create table if not exists employments
(
id bigserial not null
constraint employments_pkey
primary key,
status employment_status not null,
profile_id bigint not null
constraint fk_rails_d95865cd58
references profiles
on delete cascade
);
create unique index if not exists index_employments_on_profile_id
on employments (profile_id);
使用这些表格,我被要求列出所有失业的个人资料。失业档案被定义为没有就业记录或就业状态不是“已批准”的档案。
我的第一个尝试是以下查询:
SELECT * FROM "profiles"
LEFT JOIN employments ON employments.profile_id = profiles.id
WHERE employments.status != 'approved'
这里的假设是所有配置文件都将与他们各自的工作一起列出,然后我可以使用 where 条件过滤它们。任何没有就业记录的个人资料都将具有就业状态,null
因此会被条件过滤。但是,此查询不会返回没有工作的个人资料。
经过一番研究,我发现了这篇文章,解释了为什么它不起作用并转换了我的查询:
SELECT *
FROM profiles
LEFT JOIN employments ON profiles.id = employments.profile_id and employments.status != 'approved';
实际上确实有效。但是,我的 ORM 产生了一个稍微不同的查询,但它不起作用。
SELECT profiles.* FROM "profiles"
LEFT JOIN employments ON employments.profile_id = profiles.id AND employments.status != 'approved'
唯一的区别是 select 子句。我试图理解为什么这种细微的差异会产生如此大的差异,然后对所有三个查询进行了解释分析:
EXPLAIN ANALYZE SELECT * FROM "profiles"
LEFT JOIN employments ON employments.profile_id = profiles.id
WHERE employments.status != 'approved'
Hash Join (cost=14.28..37.13 rows=846 width=452) (actual time=0.025..0.027 rows=2 loops=1)
Hash Cond: (e.profile_id = profiles.id)
-> Seq Scan on employments e (cost=0.00..20.62 rows=846 width=68) (actual time=0.008..0.009 rows=2 loops=1)
Filter: (status <> ''approved''::employment_status)
Rows Removed by Filter: 1
-> Hash (cost=11.90..11.90 rows=190 width=384) (actual time=0.007..0.007 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on profiles (cost=0.00..11.90 rows=190 width=384) (actual time=0.003..0.004 rows=8 loops=1)
Planning Time: 0.111 ms
Execution Time: 0.053 ms
EXPLAIN ANALYZE SELECT *
FROM profiles
LEFT JOIN employments ON profiles.id = employments.profile_id and employments.status != 'approved';
Hash Right Join (cost=14.28..37.13 rows=846 width=452) (actual time=0.036..0.042 rows=8 loops=1)
Hash Cond: (employments.profile_id = profiles.id)
-> Seq Scan on employments (cost=0.00..20.62 rows=846 width=68) (actual time=0.005..0.005 rows=2 loops=1)
Filter: (status <> ''approved''::employment_status)
Rows Removed by Filter: 1
-> Hash (cost=11.90..11.90 rows=190 width=384) (actual time=0.015..0.015 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on profiles (cost=0.00..11.90 rows=190 width=384) (actual time=0.010..0.011 rows=8 loops=1)
Planning Time: 0.106 ms
Execution Time: 0.108 ms
EXPLAIN ANALYZE SELECT profiles.* FROM "profiles"
LEFT JOIN employments ON employments.profile_id = profiles.id AND employments.status != 'approved'
Seq Scan on profiles (cost=0.00..11.90 rows=190 width=384) (actual time=0.006..0.007 rows=8 loops=1)
Planning Time: 0.063 ms
Execution Time: 0.016 ms
第一个和第二个查询计划几乎相同,一个是哈希连接,另一个是右哈希连接,而最后一个查询甚至不做连接或 where 条件。
我想出了一个确实有效的第四个查询:
EXPLAIN ANALYZE SELECT profiles.* FROM profiles
LEFT JOIN employments ON employments.profile_id = profiles.id
WHERE (employments.id IS NULL OR employments.status != 'approved')
Hash Right Join (cost=14.28..35.02 rows=846 width=384) (actual time=0.021..0.026 rows=7 loops=1)
Hash Cond: (employments.profile_id = profiles.id)
Filter: ((employments.id IS NULL) OR (employments.status <> ''approved''::employment_status))
Rows Removed by Filter: 1
-> Seq Scan on employments (cost=0.00..18.50 rows=850 width=20) (actual time=0.002..0.003 rows=3 loops=1)
-> Hash (cost=11.90..11.90 rows=190 width=384) (actual time=0.011..0.011 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on profiles (cost=0.00..11.90 rows=190 width=384) (actual time=0.007..0.008 rows=8 loops=1)
Planning Time: 0.104 ms
Execution Time: 0.049 ms
我对这个主题的问题是:
- 为什么第二个和第三个查询的查询计划不同,即使它们具有相同的结构?
- 为什么查询计划第一个和第四个查询不同,即使它们的结构相同?
- 为什么 Postgres 完全忽略我的连接以及第三个查询的条件?
编辑:
对于以下示例数据,预期的查询应返回 2 和 3。
insert into profiles values (1);
insert into profiles values (2);
insert into profiles values (3);
insert into employments (profile_id, status) values (1, 'approved');
insert into employments (profile_id, status) values (2, 'denied');