sql - 优化带有子查询的查询的 SQL“Where”子句

Question

假设我有以下假设的数据结构：

create table "country"
(
  country_id integer,  
  country_name varchar(50),
  continent varchar(50),
  constraint country_pkey primary key (country_id)
);

create table "person"
(
  person_id integer,
  person_name varchar(100),
  country_id integer,
  constraint person_pkey primary key (person_id)
);

create table "event"
(
  event_id integer,
  event_desc varchar(100),
  country_id integer,
  constraint event_pkey primary key (event_id)
);

我想查询每个国家的人员和事件的行数。我决定使用子查询。

select c.country_name, sum(sub1.person_count) as person_count, sum(sub2.event_count) as event_count
from
  "country" c
  left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
    on (c.country_id=sub1.country_id)
  left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
    on (c.country_id=sub2.country_id)
group by c.country_name

我知道您可以通过在字段列表中使用 select 语句来做到这一点，但使用子查询的优点是我可以更灵活地更改 SQL 以使其汇总并使用另一个字段。假设我将查询更改为按大陆显示，就像将字段“c.country_name”替换为“c.continent”一样简单。

我的问题是关于过滤。如果我们像这样添加 where 子句：

select c.country_name, 
  sum(sub1.person_count) as person_count, 
  sum(sub2.event_count) as event_count
from
  "country" c
  left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
    on (c.country_id=sub1.country_id)
  left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
    on (c.country_id=sub2.country_id)
where c.country_name='UNITED STATES'
group by c.country_name

子查询似乎仍在为所有国家/地区执行计数。假设 person 和 event 表很大，并且我已经在所有表的 country_id 上都有索引。它真的很慢。数据库不应该只执行被过滤国家的子查询吗？我是否必须为每个子查询重新创建国家过滤器（这非常繁琐且代码不易修改）？顺便说一句，我同时使用 PostgreSQL 8.3 和 9.0，但我猜其他数据库也会发生同样的情况。

score 2 · Accepted Answer

数据库不应该只执行被过滤国家的子查询吗？

不。像您这样的查询的第一步是似乎从 FROM 子句中的所有表构造函数构建一个工作表。之后评估 WHERE 子句。

想象一下，如果 sub1 和 sub2 都是基表而不是子选择，你会怎么做。它们都有两列，并且每个 country_id 都有一行。如果你想加入所有行，你会这样写。

from
  "country" c
  left join sub1 on (c.country_id=sub1.country_id)
  left join sub2 on (c.country_id=sub2.country_id)

但是，如果您想在单行上加入，您将编写与此等效的内容。

from
  "country" c
  left join (select * from sub1 where country_id = ?)
    on (c.country_id=sub1.country_id)
  left join (select * from sub2 where country_id = ?)
    on (c.country_id=sub2.country_id)

帮助开发早期 SQL 标准的 Joe Celko 经常撰写有关SQL 的评估顺序如何出现在 Usenet 上的文章。

score 0 · Accepted Answer

您可以使用country_idnot过滤/分组行country_name吗？我想你没有名字索引。
子查询不使用任何索引就可以了，因为您扫描了所有表。如果你想减少扫描，你应该过滤数据。

sql - 优化带有子查询的查询的 SQL“Where”子句

2 回答 2

Related

Reference