sql - 如何快速找到中转站之间的旅行时间 - 在 PostgreSQL 中使用 GTFS 数据

Question

我有一个 PostgreSQL 数据库（带有 PostGIS）和来自几个运输机构的GTFS 数据（ https://developers.google.com/transit/gtfs/reference ）。我已经根据邻近度确定了所有潜在的转移位置，并用这些数据填充了一个表格。我现在想找到该地区各点之间的旅行时间，最多允许 2 次换乘。我创建了一个视图来连接我的所有表，以使我的查询更容易阅读。这是我的看法：

CREATE OR REPLACE VIEW trip_planning_data_view AS 
 SELECT b.agency_id, i.agency_name, h.route_id, h.route_long_name, h.route_short_name, h.route_type, 
    e.trip_headsign, e.direction_id, a.stop_id AS stop_id_a, c.stop_name AS origin_stop_name, a.arrival_time AS origin_arrival_time, 
    b.stop_id AS stop_id_b, d.stop_name AS destination_stop_name, b.arrival_time AS destination_arrival_time, 
    b.arrival_time - a.arrival_time AS travel_time, 
    g.agency_id_b AS transfer_agency_id, g.stop_id_b AS transfer_stop_id, g.distance_meters AS transfer_distance_meters, 
    (round(g.distance_meters / 60::double precision)::character varying || ' Minutes'::character varying)::interval AS transfer_time, 
    b.arrival_time + ((round(g.distance_meters / 60::double precision)::character varying || ' Minutes'::character varying)::interval) AS transfer_arrival_time
   FROM stop_time a
   JOIN stop_time b ON a.agency_id = b.agency_id AND a.trip_id = b.trip_id AND a.stop_id <> b.stop_id AND a.stop_sequence < b.stop_sequence AND a.arrival_time < b.arrival_time
   JOIN stop c ON a.agency_id = c.agency_id AND a.stop_id = c.stop_id
   JOIN stop d ON b.agency_id = d.agency_id AND b.stop_id = d.stop_id
   JOIN trip e ON a.agency_id = e.agency_id AND a.trip_id = e.trip_id
   JOIN calendar f ON e.agency_id = f.agency_id AND e.service_id = f.service_id
   LEFT JOIN stop_transfers g ON b.agency_id = g.agency_id_a AND b.stop_id = g.stop_id_a
   JOIN route h ON e.agency_id = h.agency_id AND e.route_id = h.route_id
   JOIN agency i ON h.agency_id = i.agency_id
  WHERE f.monday = true
  ORDER BY a.stop_id, b.arrival_time - a.arrival_time;

（我只对星期一的旅行感兴趣，我不知道为什么，但是视图中的 ORDER BY 子句使性能得到了巨大的提升。）

这些表符合 GTFS 文件结构，并添加了 stop_transfers 表，其中包含可以进行转移的代理和站点 ID 以及站点之间的距离。

在此视图中查询 1 次换乘行程非常快（通常不到 1 秒），但是查询 2 次换乘行程需要很长时间（几分钟）。以下是 2 次转乘行程查询的示例：

select *
from trip_planning_data_view t0 
join trip_planning_data_view t1 on t0.transfer_agency_id = t1.agency_id and t0.transfer_stop_id = t1.stop_id_a 
join trip_planning_data_view t2 on t1.transfer_agency_id = t2.agency_id and t1.transfer_stop_id = t2.stop_id_a 
where t0.agency_id = '1A' 
and t0.stop_id_a = 's101' 
and t0.origin_arrival_time between ('08:00:00'::interval) and ('08:00:00'::interval + '30 minutes'::interval )
and t1.origin_arrival_time between (t0.origin_arrival_time + t0.travel_time + t0.transfer_time) and (t0.origin_arrival_time + '30 minutes'::interval + t0.travel_time + t0.transfer_time) 
and t2.agency_id = '1A' 
and t2.stop_id_b = 's247' 
and t2.origin_arrival_time between (t1.origin_arrival_time + t1.travel_time + t1.transfer_time) and (t1.origin_arrival_time + '30 minutes'::interval + t1.travel_time + t1.transfer_time)

这是查询计划：

Nested Loop  (cost=168984.47..203333.30 rows=1 width=651)
  ->  Nested Loop  (cost=168984.47..203324.90 rows=1 width=686)
        Join Filter: (((g.stop_id_b)::text = (a.stop_id)::text) AND (a.arrival_time >= ((a.arrival_time + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)) AND (a.arrival_time <= (((a.arrival_time + '00:30:00'::interval) + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)))
        ->  Nested Loop  (cost=0.00..117.22 rows=1 width=252)
              Join Filter: ((a.agency_id)::text = (h.agency_id)::text)
              ->  Nested Loop  (cost=0.00..108.94 rows=1 width=216)
                    ->  Nested Loop  (cost=0.00..100.65 rows=1 width=220)
                          Join Filter: ((a.agency_id)::text = (e.agency_id)::text)
                          ->  Nested Loop  (cost=0.00..91.92 rows=1 width=198)
                                ->  Nested Loop  (cost=0.00..83.50 rows=1 width=161)
                                      Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
                                      ->  Nested Loop Left Join  (cost=0.00..42.66 rows=1 width=112)
                                            ->  Nested Loop  (cost=0.00..34.29 rows=1 width=90)
                                                  ->  Index Scan using st_a_s_idx on stop_time b  (cost=0.00..25.88 rows=1 width=53)
                                                        Index Cond: (((agency_id)::text = '1A'::text) AND ((stop_id)::text = 's247'::text))
                                                  ->  Index Scan using a_stop_idx on stop d  (cost=0.00..8.40 rows=1 width=44)
                                                        Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((stop_id)::text = (b.stop_id)::text))
                                            ->  Index Scan using stop_transfers_as_a_idx on stop_transfers g  (cost=0.00..8.35 rows=1 width=36)
                                                  Index Cond: (((b.agency_id)::text = (agency_id_a)::text) AND ((b.stop_id)::text = (stop_id_a)::text))
                                      ->  Index Scan using st_a_t_idx on stop_time a  (cost=0.00..40.78 rows=3 width=53)
                                            Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
                                ->  Index Scan using a_stop_idx on stop c  (cost=0.00..8.40 rows=1 width=44)
                                      Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
                          ->  Index Scan using trip_id_idx on trip e  (cost=0.00..8.71 rows=1 width=80)
                                Index Cond: ((trip_id)::text = (a.trip_id)::text)
                    ->  Index Scan using a_s_idx on calendar f  (cost=0.00..8.28 rows=1 width=20)
                          Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
                          Filter: monday
              ->  Index Scan using route_id_idx on route h  (cost=0.00..8.27 rows=1 width=41)
                    Index Cond: ((route_id)::text = (e.route_id)::text)
        ->  Nested Loop  (cost=168984.47..203207.60 rows=1 width=434)
              ->  Nested Loop  (cost=168984.47..203199.32 rows=1 width=477)
                    ->  Nested Loop  (cost=168984.47..203191.04 rows=1 width=520)
                          Join Filter: (((g.agency_id_b)::text = (b.agency_id)::text) AND ((g.stop_id_b)::text = (a.stop_id)::text) AND (a.arrival_time >= ((a.arrival_time + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)) AND (a.arrival_time <= (((a.arrival_time + '00:30:00'::interval) + (b.arrival_time - a.arrival_time)) + ((((round((g.distance_meters / 60::double precision)))::character varying)::text || ' Minutes'::text))::interval)))
                          ->  Nested Loop  (cost=168933.50..178461.70 rows=1 width=260)
                                ->  Nested Loop  (cost=168933.50..178453.41 rows=1 width=264)
                                      ->  Nested Loop  (cost=168933.50..178444.99 rows=1 width=227)
                                            Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
                                            ->  Nested Loop  (cost=168933.50..178404.16 rows=1 width=236)
                                                  Join Filter: ((b.agency_id)::text = (h.agency_id)::text)
                                                  ->  Nested Loop  (cost=168933.50..178387.59 rows=2 width=200)
                                                        Join Filter: ((b.agency_id)::text = (e.agency_id)::text)
                                                        ->  Nested Loop  (cost=168933.50..177724.50 rows=76 width=120)
                                                              ->  Merge Join  (cost=168933.50..170942.05 rows=869 width=89)
                                                                    Merge Cond: (((b.agency_id)::text = (g.agency_id_a)::text) AND ((b.stop_id)::text = (g.stop_id_a)::text))
                                                                    ->  Sort  (cost=144224.83..144325.07 rows=40096 width=53)
                                                                          Sort Key: b.agency_id, b.stop_id
                                                                          ->  Bitmap Heap Scan on stop_time b  (cost=1068.60..141159.25 rows=40096 width=53)
                                                                                Recheck Cond: ((agency_id)::text = '1A'::text)
                                                                                ->  Bitmap Index Scan on st_a_s_idx  (cost=0.00..1058.58 rows=40096 width=0)
                                                                                      Index Cond: ((agency_id)::text = '1A'::text)
                                                                    ->  Sort  (cost=24708.45..25274.92 rows=226587 width=36)
                                                                          Sort Key: g.agency_id_a, g.stop_id_a
                                                                          ->  Seq Scan on stop_transfers g  (cost=0.00..4553.87 rows=226587 width=36)
                                                              ->  Index Scan using a_stop_idx on stop d  (cost=0.00..7.79 rows=1 width=44)
                                                                    Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((stop_id)::text = (b.stop_id)::text))
                                                        ->  Index Scan using trip_id_idx on trip e  (cost=0.00..8.71 rows=1 width=80)
                                                              Index Cond: ((trip_id)::text = (b.trip_id)::text)
                                                  ->  Index Scan using route_id_idx on route h  (cost=0.00..8.27 rows=1 width=41)
                                                        Index Cond: ((route_id)::text = (e.route_id)::text)
                                            ->  Index Scan using st_a_t_idx on stop_time a  (cost=0.00..40.80 rows=1 width=53)
                                                  Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
                                                  Filter: ((arrival_time >= '08:11:00'::interval) AND (arrival_time <= '08:30:00'::interval) AND ((stop_id)::text = 's101'::text))
                                      ->  Index Scan using a_stop_idx on stop c  (cost=0.00..8.40 rows=1 width=44)
                                            Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
                                ->  Index Scan using a_s_idx on calendar f  (cost=0.00..8.28 rows=1 width=20)
                                      Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
                                      Filter: monday
                          ->  Nested Loop  (cost=50.97..24729.27 rows=1 width=260)
                                ->  Nested Loop  (cost=50.97..24720.85 rows=1 width=223)
                                      Join Filter: (((a.stop_id)::text <> (b.stop_id)::text) AND (a.stop_sequence < b.stop_sequence) AND (a.arrival_time < b.arrival_time))
                                      ->  Nested Loop  (cost=50.97..24680.01 rows=1 width=232)
                                            ->  Nested Loop  (cost=50.97..24663.42 rows=2 width=236)
                                                  Join Filter: ((b.agency_id)::text = (h.agency_id)::text)
                                                  ->  Nested Loop  (cost=50.97..24447.16 rows=29 width=200)
                                                        Join Filter: ((b.agency_id)::text = (e.agency_id)::text)
                                                        ->  Nested Loop  (cost=50.97..15148.58 rows=1096 width=120)
                                                              ->  Nested Loop  (cost=50.97..12475.29 rows=59 width=80)
                                                                    ->  Bitmap Heap Scan on stop_transfers g  (cost=50.97..2141.81 rows=1375 width=36)
                                                                          Recheck Cond: ((agency_id_b)::text = '1A'::text)
                                                                          ->  Bitmap Index Scan on stop_transfers_as_b_idx  (cost=0.00..50.63 rows=1375 width=0)
                                                                                Index Cond: ((agency_id_b)::text = '1A'::text)
                                                                    ->  Index Scan using a_stop_idx on stop d  (cost=0.00..7.50 rows=1 width=44)
                                                                          Index Cond: (((agency_id)::text = (g.agency_id_a)::text) AND ((stop_id)::text = (g.stop_id_a)::text))
                                                              ->  Index Scan using st_a_s_idx on stop_time b  (cost=0.00..45.22 rows=6 width=53)
                                                                    Index Cond: (((agency_id)::text = (d.agency_id)::text) AND ((stop_id)::text = (d.stop_id)::text))
                                                        ->  Index Scan using trip_id_idx on trip e  (cost=0.00..8.47 rows=1 width=80)
                                                              Index Cond: ((trip_id)::text = (b.trip_id)::text)
                                                  ->  Index Scan using route_id_idx on route h  (cost=0.00..7.44 rows=1 width=41)
                                                        Index Cond: ((route_id)::text = (e.route_id)::text)
                                            ->  Index Scan using a_s_idx on calendar f  (cost=0.00..8.28 rows=1 width=20)
                                                  Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((service_id)::text = (e.service_id)::text))
                                                  Filter: monday
                                      ->  Index Scan using st_a_t_idx on stop_time a  (cost=0.00..40.78 rows=3 width=53)
                                            Index Cond: (((agency_id)::text = (b.agency_id)::text) AND ((trip_id)::text = (b.trip_id)::text))
                                ->  Index Scan using a_stop_idx on stop c  (cost=0.00..8.40 rows=1 width=44)
                                      Index Cond: (((agency_id)::text = (a.agency_id)::text) AND ((stop_id)::text = (a.stop_id)::text))
                    ->  Index Scan using agency_id_idx on agency i  (cost=0.00..8.27 rows=1 width=31)
                          Index Cond: ((agency_id)::text = (a.agency_id)::text)
              ->  Index Scan using agency_id_idx on agency i  (cost=0.00..8.27 rows=1 width=31)
                    Index Cond: ((agency_id)::text = (a.agency_id)::text)
  ->  Index Scan using agency_id_idx on agency i  (cost=0.00..8.27 rows=1 width=31)
        Index Cond: ((agency_id)::text = (a.agency_id)::text)

查询计划似乎正在使用索引。任何优化此或更好方法的建议将不胜感激。提前致谢。

score 0 · Accepted Answer

我认为您最好使用 OpenTripPlanner ( http://www.opentripplanner.org/ ) 之类的工具，它是一个与 GTFS 配合使用的开源交通路由引擎。它可用于快速有效地回答各种路由查询的问题，包括“允许 N 次中转的两站之间的最快时间”。

或者，如果该机构与 Google 共享他们的数据（机会很好 - http://www.google.com/landing/transit/cities/index.html），那么您可以使用 Google 路线 API（https://developers .google.com/maps/documentation/directions/）查询您的两个输入位置的公交路线。

sql - 如何快速找到中转站之间的旅行时间 - 在 PostgreSQL 中使用 GTFS 数据

1 回答 1

Related

Reference