8

是否可以选择DISTINCT ON一些单独的、独立的列集的行?

假设我想要所有符合以下条件的行:

  • 区别于(name, birth)
  • 区别于(name, height)

因此,在下表中,标有红叉的行不会是不同的(带有失败子句的指示):

name      birth    height
--------------------------
William    1976      1.82
James      1981      1.68
Mike       1976      1.68
Tom        1967      1.79
William    1976      1.74   ❌ (name, birth)
William    1981      1.82   ❌ (name, height)
Tom        1978      1.92
Mike       1963      1.68   ❌ (name, height)
Tom        1971      1.86
James      1981      1.77   ❌ (name, birth)
Tom        1971      1.89   ❌ (name, birth)

在上面的例子中,如果DISTINCT ON子句刚刚是DISTINCT ON (name, birth, height),那么所有的行都会被认为是不同的。

试过但没有用:

  • SELECT DISTINCT ON (name, birth) (name, height) ...
  • SELECT DISTINCT ON (name, birth), (name, height) ...
  • SELECT DISTINCT ON ((name, birth), (name, height)) ...
  • SELECT DISTINCT ON (name, birth) AND (name, height) ...
  • SELECT DISTINCT ON (name, birth) AND ON (name, height) ...
  • SELECT DISTINCT ON (name, birth) DISTINCT ON (name, height) ...
  • SELECT DISTINCT ON (name, birth), DISTINCT ON (name, height) ...
4

2 回答 2

12

正如所评论的那样,这个问题存在歧义。每次调用的结果行数可能不同。如果您对任意结果感到满意,@klin 的解决方案就足够了。否则,您需要更紧密地定义需求。喜欢:
distinct on (name, birth),首先选择最小的高度,然后选择最小的ID作为决胜局

或者:
distinct on (name, height),首先选择最早的出生,然后选择最小的 ID 作为 tiebreaker

您的表应该有一个主键(或某种唯一标识行的方式):

CREATE TEMP TABLE tbl (
  tbl_id serial PRIMARY KEY
, name text
, birth int
, height numeric);

INSERT INTO tbl (name, birth, height)
VALUES
  ('William', 1976, 1.82)
, ('James',   1981, 1.68)
, ('Mike',    1976, 1.68)
, ('Tom',     1967, 1.79)
, ('William', 1976, 1.74)
, ('William', 1981, 1.82)
, ('Tom',     1978, 1.92)
, ('Mike',    1963, 1.68)
, ('Tom',     1971, 1.86)
, ('James',   1981, 1.77)
, ('Tom',     1971, 1.89);

询问:

SELECT DISTINCT ON (name, height) *
FROM  (
   SELECT DISTINCT ON (name, birth) *
   FROM   tbl
   ORDER  BY name, birth, height, tbl_id  -- pick smallest height, ID as tiebreaker
   ) sub
ORDER  BY name, height, birth, tbl_id;    -- pick earliest birth, ID as tiebreaker
 tbl_id |  name   | birth | height
--------+---------+-------+--------
      2 | James   |  1981 |   1.68
      8 | Mike    |  1963 |   1.68
      4 | Tom     |  1967 |   1.79
      9 | Tom     |  1971 |   1.86
      7 | Tom     |  1978 |   1.92
      5 | William |  1976 |   1.74
      6 | William |  1981 |   1.82
(7 rows)    -- !!!

DISTINCT ON没有确定性的查询ORDER BY可以从每组欺骗中返回任意行。应用一次,您仍然可以获得确定的行数(任意选择)。重复应用,得到的行数也是任意的。有关的:

于 2017-02-28T15:32:30.080 回答
1

使用派生表:

with my_table(name, birth, height) as (
values
('William',    1976,      1.82),
('James',      1981,      1.68),
('Mike',       1976,      1.68),
('Tom',        1967,      1.79),
('William',    1976,      1.74),  -- ? (name, birth)
('William',    1981,      1.82),  -- ? (name, height)
('Tom',        1978,      1.92),
('Mike',       1963,      1.68),  -- ? (name, height)
('Tom',        1971,      1.86),
('James',      1981,      1.77),  -- ? (name, birth)
('Tom',        1971,      1.89)   -- ? (name, birth)
)
select distinct on (name, height) *
from (
    select distinct on (name, birth) *
    from my_table
    ) s

  name   | birth | height 
---------+-------+--------
 James   |  1981 |   1.68
 Mike    |  1963 |   1.68
 Tom     |  1967 |   1.79
 Tom     |  1971 |   1.89
 Tom     |  1978 |   1.92
 William |  1976 |   1.82
(6 rows)        
于 2017-02-28T10:09:52.547 回答