postgresql - '查看'（不删除）从连接获得的 Postgresql 表中的重复行

Question

所以我有通过加入三个表创建的临时表：

旅行
停止
停止时间

Stop_times 表有一个trip_id 列表、相应的站点以及这些站点的公交车的预定到达和离开时间。

我在网上搜索，似乎到处都能找到有关如何删除重复项（使用 ctid、嵌套查询）但没有查看它们的答案。

我的查询看起来像这样：

CREATE TEMP TABLE temp as
SELECT 
 (CASE st.arrival_time < current_timestamp::time     
 WHEN true THEN (current_timestamp::date + interval '1 day') + st.arrival_time     
 ELSE (current_timestamp::date) + st.arrival_time     
 END) as arrival,      
 CASE st.departure_time < current_timestamp::time     
 WHEN true THEN (current_timestamp::date + interval '1 day') + st.departure_time     
 ELSE (current_timestamp::date) + st.departure_time     
 END as departure,     st.trip_id, st.stop_id, st.stop_headsign,route_id,   t.trip_headsign, s.stop_code, s.stop_name,      s.stop_lat, s.stop_lon

 FROM schema.stop_times st     
 JOIN schema.trips t ON t.trip_id=st.trip_id     
 JOIN schema.stops s ON s.stop_id=st.stop_id

 order by arrival, departure;

我知道有重复项（通过运行 select * 并在 temp 上选择 DISTINCT），我只需要识别重复项...任何帮助将不胜感激！

PS：我知道我可以使用 DISTINCT 并消除重复项，但它会大大减慢查询速度，因此我需要重新处理我需要识别重复项的查询，结果记录大于 200,000，因此将它们导出到excel和过滤重复项也不是一个选项（我试过但excel无法处理）

score 0 · Accepted Answer

我相信这会给你你想要的：

SELECT arrival, departure, trip_id, stop_id, stop_headsign, route_id,
headsign, stop_code, stop_name, stop_lat, stop_lon, count(*)
FROM temp
GROUP BY arrival, departure, trip_id, stop_id, stop_headsign, route_id,
headsign, stop_code, stop_name, stop_lat, stop_lon
HAVING count(*) > 1;

postgresql - '查看'（不删除）从连接获得的 Postgresql 表中的重复行

1 回答 1

Related

Reference