我有一张桌子,上面有某个人在某个时间戳上访问某个城市:
城市访问:
person_id city timestamp
-----------------------------------------------
1 Paris 2017-01-01 00:00:00
1 Amsterdam 2017-01-03 00:00:00
1 Brussels 2017-01-04 00:00:00
1 London 2017-01-06 00:00:00
2 Berlin 2017-01-01 00:00:00
2 Brussels 2017-01-02 00:00:00
2 Berlin 2017-01-06 00:00:00
2 Hamburg 2017-01-07 00:00:00
另一个表格列出了一个人购买冰淇淋的时间:
冰淇淋事件:
person_id flavour timestamp
-----------------------------------------------
1 Vanilla 2017-01-02 00:12:00
1 Chocolate 2017-01-05 00:18:00
2 Strawberry 2017-01-03 00:09:00
2 Caramel 2017-01-05 00:15:00
对于city_visits
表中的每一行,我需要加入同一个人的下一个冰淇淋活动,以及它的时间戳和风味:
期望输出:
person_id city timestamp ic_flavour ic_timestamp
---------------------------------------------------------------------------
1 Paris 2017-01-01 00:00:00 Vanilla 2017-01-02 00:12:00
1 Amsterdam 2017-01-03 00:00:00 Chocolate 2017-01-05 00:18:00
1 Brussels 2017-01-04 00:00:00 Chocolate 2017-01-05 00:18:00
1 London 2017-01-06 00:00:00 null null
2 Berlin 2017-01-01 00:00:00 Strawberry 2017-01-03 00:09:00
2 Brussels 2017-01-02 00:00:00 Strawberry 2017-01-03 00:09:00
2 Berlin 2017-01-06 00:00:00 null null
2 Hamburg 2017-01-07 00:00:00 null null
我尝试了以下方法:
SELECT DISTINCT ON (cv.person_id, cv.timestamp)
cv.person_id,
cv.city,
cv.timestamp,
ic.flavour as ic_flavour,
ic.timestamp as ic_timestamp
FROM city_visits cv
JOIN ice_cream_events ic
ON ic.person_id = cv.person_id
AND ic.timestamp > cv.timestamp
该DISTINCT ON
条款禁止在每次城市访问中加入除一个未来冰淇淋事件之外的所有事件。它可以工作,但是它不会自动选择第一个,而是似乎会为同一个人选择未来的任何冰淇淋事件。我可以添加的任何ORDER BY
条款似乎都不会改变这一点。
解决该问题的理想方法是使子句在每次必须过滤掉重复项时都DISTINCT ON
选择最小值。ic_timestamp