sql - PostgreSQL - 获取聚合列的相关列

Question

我有一张叫做“地方”的桌子

origin | destiny | distance
---------------------------
A      | X       | 5
A      | Y       | 8
B      | X       | 12
B      | Y       | 9

对于每个起源，我想找出最接近的命运。在 MySQL 我可以做

SELECT origin, destiny, MIN(distance) FROM places GROUP BY origin

我可以期待以下结果

origin | destiny | distance
---------------------------
A      | X       | 5
B      | y       | 9

不幸的是，这个查询在 PostgreSQL 中不起作用。Postgre 迫使我要么将“命运”放在他自己的聚合函数中，要么将其定义为 GROUP BY 语句的另一个参数。两种“解决方案”都完全改变了我想要的结果。

如何将上述 MySQL 查询转换为 PostgreSQL？

score 6 · Accepted Answer

MySQL 是唯一允许通过处理损坏（MySQL 术语中的“丢失”）组的 DBMS。所有其他 DBMS（包括 Postgres）都会拒绝您的原始声明。

在 Postgres 中，您可以使用distinct on运算符来实现相同的目的：

select distinct on (origin) 
       origin, 
       destiny, 
       distance
from places
order by origin, distance;

ANSI 解决方案是这样的：

select p.origin, 
       p.destiny, 
       p.distance
from places p
  join (select p2.origin, min(p2.distance) as distance
        from places  p2
        group by origin
) t on t.origin = p.origin and t.distance = p.distance
order by origin;

或者不使用窗口函数连接

select t.origin,
       t.destiny,
       t.distance
from (
    select origin, 
           destiny, 
           distance, 
           min(distance) over (partition by origin) as min_dist
    from places
) t 
where distance = min_dist
order by origin;

或另一种具有窗口功能的解决方案：

select distinct origin,
       first_value(destiny) over (partition by origin order by distance) as destiny, 
       min(distance) over (partition by origin) as distance
from places
order by origin;

我的猜测是第一个（特定于 Postgres）可能是最快的。

这是所有三个解决方案的 SQLFiddle：http ://sqlfiddle.com/#!12/68308/2

请注意，MySQL 结果实际上可能不正确，因为它将返回一个任意（=随机）的值作为命运。MySQL 返回的值可能不是属于最低距离的值。

可以在此处找到有关通过在 MySQL 中处理的损坏组的更多详细信息：http ://www.mysqlperformanceblog.com/2006/09/06/wrong-group-by-makes-your-queries-fragile/

score 2 · Accepted Answer

只是为 a_horse_with_no_name 答案添加另一个可能的解决方案 - 使用窗口函数row_num：

with cte as (
    select
        row_number() over(partition by origin order by distance) as row_num,
        *
    from places
)
select
    origin, 
    destiny, 
    distance    
from cte
where row_num = 1

它也适用于 SQL Server 或其他 RDBMS 支持row_number。不过，在 PostgreSQL 中，我更喜欢distinct on语法。

sql fiddle demo

score 2 · Accepted Answer

在 PostgreSQL 中最简洁的（在我看来）方法是使用一个聚合函数，它清楚地指定应该选择哪个值。destiny

所需的值可以描述为“第一个匹配destiny，如果您按它们的匹配行排序distance”。

因此，您需要两件事：

“第一个”聚合，它只返回值列表中的“第一个”。这很容易定义，但不包括在标准中。
指定这些匹配项的顺序的能力（否则，就像 MySQL 的“松散分组依据”一样，它将未定义您实际获得的值）。这是在 PostgreSQL 9.0 中添加的，语法记录在 "Aggregate Expressions" 下。

一旦first()定义了聚合（在您设置初始表时，每个数据库只需执行一次），您就可以编写：

Select
       origin, 
       first(destiny Order by distance Asc) as closest_destiny, 
       min(distance) as closest_destiny_distance
       -- Or, equivalently: first(distance Order by distance Asc) as closest_destiny_distance
from places
group by origin
order by origin;

这是一个 SQLFiddle 演示，展示了整个操作过程。

sql - PostgreSQL - 获取聚合列的相关列

3 回答 3

Related

Reference