database - 如何在 postgresql 中使用 group by

Question

目标是使用两个不同的表进行查询；国家和城市。Country 包含 name (of Country) 和 country_code (主键)，city 包含 name (of city)、人口和 country_code (主键)。我想使用聚合函数 GROUP BY，但是下面的查询不起作用。

对于每个国家，列出其任何城市中人口最多的城市以及该城市的名称。所以我需要列出每个国家人口最多的城市。

所以应该显示的是国家，城市（人口最多），然后是那个城市的人口。每个城市应该只有一个国家。

$query6 = "SELECT c.name AS country, ci.name AS city,
GREATEST(ci.population) AS max_pop
FROM lab6.country c INNER JOIN lab6.city ci
ON(c.country_code = ci.country_code)
GROUP BY c.name
ORDER BY country ASC";

我也尝试过 GROUP BY country, DISTINCT c.name。

我是聚合函数的新手，所以如果有特定情况你应该使用 GROUP BY 而这不是其中之一，请告诉我。

我正在使用 PHP 来运行查询，如下所示：

$result = pg_query($connection, $query);
if(!$result)
{
       die("Failed to connect to database");
}

错误：列“ci.name”必须出现在 GROUP BY 子句中或用于聚合函数第 1 行：SELECT DISTINCT c.name AS country, ci.name AS city 是错误。

这些桌子是给我们的，我们不制作它们，而且我不能包括制作桌子的屏幕截图，因为我没有任何声誉。

score 4 · Accepted Answer

一些 DDL 可以玩。

create table country (
  country_code char(2) primary key, -- ISO country code
  country_name varchar(35) not null unique
);

insert into country values 
('US', 'United States of America'),
('IT', 'Italy'),
('IN', 'India');

-- The full name of a city is more than city name plus country name.
-- In the US, there are a couple of dozen cities named Springfield,
-- each in a different state. I'd be surprised if this weren't true
-- in most countries.
create table city (
  country_code char(2) not null references country (country_code),
  name varchar(35) not null,
  population integer not null check (population > 0),
  primary key (country_code, name)
);

insert into city values 
('US', 'Rome, GA', 36303),
('US', 'Washington, DC', 632323),
('US', 'Springfield, VA', 30484),
('IT', 'Rome', 277979),
('IT', 'Milan', 1324110),
('IT', 'Bari', 320475),
('IN', 'Mumbai', 12478447),
('IN', 'Patna', 1683200),
('IN', 'Cuttack', 606007);

人口最多的国家。

select country.country_code, max(city.population) as max_population
from country
inner join city on country.country_code = city.country_code
group by country.country_code;

有几种方法可以使用它以获得您想要的结果。一种方法是在公用表表达式上使用内连接。

with max_population as (
  select country.country_code, max(city.population) as max_population
  from country
  inner join city on country.country_code = city.country_code
  group by country.country_code
)
select city.country_code, city.name, city.population
from city
inner join max_population 
        on max_population.country_code = city.country_code
       and max_population.max_population = city.population;

另一种方法是在子查询上使用内连接。（公用表表达式的文本“进入”主查询。使用别名“max_population”，查询不需要进一步更改即可工作。）

select city.country_code, city.name, city.population
from city
inner join (select country.country_code, max(city.population) as max_population
            from country
            inner join city on country.country_code = city.country_code
            group by country.country_code
           ) max_population 
        on max_population.country_code = city.country_code
       and max_population.max_population = city.population;

还有一种方法是在子查询中使用窗口函数。您需要从子查询中进行选择，因为您不能在 WHERE 子句中直接使用 rank() 的结果。也就是说，这行得通。

select country_code, name, population
from (select country_code, name, population,
      rank() over (partition by country_code 
                   order by population desc) as city_population_rank
      from city
     ) city_population_rankings
where city_population_rank = 1;

但事实并非如此，尽管乍一看它更有意义。

select country_code, name, population,
       rank() over (partition by country_code 
                    order by population desc) as city_population_rank
from city
where city_population_rank = 1;

ERROR:  column "city_population_rank" does not exist

score 0 · Accepted Answer

执行此操作的最佳方法是使用窗口化的 PostgreSQL 最新版本。（文档。）当您想要将特殊行的一些其他列（例如人口最多的行）带入最终输出时，必须做丑陋的事情之前。

WITH preliminary AS 
     (SELECT country_code, city_name, population,
      rank() OVER (PARTITION BY country_code ORDER BY population DESC) AS r
      FROM country
      NATURAL JOIN city) -- NATURAL JOIN collapses 2 country_code columns into 1
SELECT * FROM preliminary WHERE r=1;

在一个国家的两个或多个最大城市拥有完全相同的人口这一公认不太可能的情况下，这也起到了一些聪明的作用。

[编辑以回应评论]

在开窗之前，我通常的做法是

SELECT country_code, city_name, population
FROM country co1 NATURAL JOIN city ci1
WHERE ROW(co1.country_code, ci1.population) =
    (SELECT co2.country_code, ci2.population 
     FROM country co2 NATURAL JOIN city ci2
     WHERE co1.country_code = co2.country_code 
     ORDER BY population DESC LIMIT 1) 
     AS subquery;
-- note for lurkers, some other DBs use TOP 1 instead of LIMIT

这样做的性能还不错，因为如果数据库被智能索引，Postgres 会优化子查询。将此与 Mike Sherrill 答案的子查询方法的内部联接进行比较。

支持我们导师的回答，好吗？使用您目前拥有的设备，它可能效率低下、不完整，或两者兼而有之。

database - 如何在 postgresql 中使用 group by

2 回答 2

Related

Reference