39

相关 - PostgreSQL DISTINCT ON 与不同的 ORDER BY

我有餐桌购买(product_id、purchased_at、address_id)

样本数据:

| id | product_id |   purchased_at    | address_id |
| 1  |     2      | 20 Mar 2012 21:01 |     1      |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |
| 3  |     2      | 20 Mar 2012 21:39 |     2      |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |

我期望的结果是每个 address_id 的最近购买的产品(整行),并且该结果必须按 purchase_at 字段的后代顺序排序:

| id | product_id |   purchased_at    | address_id |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |

使用查询:

SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM "purchases"
WHERE "purchases"."product_id" = 2
ORDER BY purchases.address_id ASC, purchases.purchased_at DESC

我越来越:

| id | product_id |   purchased_at    | address_id |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |

所以行是相同的,但顺序是错误的。有什么办法解决吗?

4

3 回答 3

26

很明确的问题:)

SELECT t1.* FROM purchases t1
LEFT JOIN purchases t2
ON t1.address_id = t2.address_id AND t1.purchased_at < t2.purchased_at
WHERE t2.purchased_at IS NULL
ORDER BY t1.purchased_at DESC

而且很可能是一种更快的方法:

SELECT t1.* FROM purchases t1
JOIN (
    SELECT address_id, max(purchased_at) max_purchased_at
    FROM purchases
    GROUP BY address_id
) t2
ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at
ORDER BY t1.purchased_at DESC
于 2012-03-20T22:45:08.967 回答
13

DISTINCT ON使用您的 ORDER BY来选择要为每个不同的 address_id 生成哪一行。如果您想对结果记录进行排序,请将 DISTINCT ON 设为子选择并对其结果进行排序:

SELECT * FROM
(
  SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
  FROM "purchases"
  WHERE "purchases"."product_id" = 2
  ORDER BY purchases.address_id ASC, purchases.purchased_at DESC
) distinct_addrs
order by distinct_addrs.purchased_at DESC
于 2012-03-21T04:18:01.187 回答
4

这个查询比看起来更难正确改写。

当前接受的基于连接的答案不能正确处理两个候选行具有相同给定purchased_at值的情况:它将返回两行。

您可以通过以下方式获得正确的行为:

SELECT * FROM purchases AS given
WHERE product_id = 2
AND NOT EXISTS (
    SELECT NULL FROM purchases AS other
    WHERE given.address_id = other.address_id
    AND (given.purchased_at < other.purchased_at OR given.id < other.id)
)
ORDER BY purchased_at DESC

请注意,它是如何通过比较id值来消除purchased_at值匹配的情况的回退。这确保了条件只能对具有相同address_id值的行中的单行为真。

原始查询使用DISTINCT ON自动处理这种情况!

另请注意,您被迫在条件和子句中对您想要“每个最新的”这一事实进行address_id两次编码的方式,并且您必须确保它们匹配。我不得不多花几分钟来说服自己这个查询确实是正确的。given.purchased_at < other.purchased_atORDER BY purchased_at DESC

正如dbenhurDISTINCT ON所建议的那样,通过与外部子查询一起使用,正确且易于理解地编写此查询要容易得多。

于 2017-07-12T18:34:23.360 回答