1

大家好,我是python的菜鸟,目前正在学习,我想知道是否有人可以帮助我解决我面临的问题。我有四个文件 routes.txt、trips.txt、stop_times.txt、stops.txt,文件看起来像这样(文件有数千行):

routes.txt 
"route_id","agency_id","route_short_name","route_long_name","route_desc","route_type","route_url","route_color","route_text_color"
"01","1","1",,,3,,"FFFF7C","000000"
"04","1","4",,,3,,"FFFF7C","000000"
"05","1","5",,,3,,"FFFF7C","000000"
"07","1","7",,,3,,"FFFF7C","000000"

trips.txt
"route_id","service_id","trip_id","trip_headsign","direction_id","block_id","shape_id"
"108","BUSN13-hbf13011-Weekday-02","19417636","Malden Station via Salem St.",1,"F411-75","1080037"
"94","BUSN13-hbf13011-Weekday-02","19417637","Medford Square via West Medford",0,"F94-5","940014"

stop_times.txt
"trip_id","arrival_time","departure_time","stop_id","stop_sequence","stop_headsign","pickup_type","drop_off_type"
"19417636","14:40:00","14:40:00","7412",1,,0,0
"19417636","14:41:00","14:41:00","6283",2,,0,0
"19417636","14:41:00","14:41:00","6284",3,,0,0

stops.txt
stop_id","stop_code","stop_name","stop_desc","stop_lat","stop_lon","zone_id","stop_url","location_type","parent_station"
"place-alfcl","","Alewife Station","","42.395428","-71.142483","","",1,""
"place-alsgr","","Allston St. Station","","42.348701","-71.137955","","",1,""
"place-andrw","","Andrew Station","","42.330154","-71.057655","","",1,""

我正在尝试根据列 ID 打印行。例如,如果我们有一个 route_id = "01"。

check the ID in the routes.txt file and check if that ID is equal to the route_id in the Trips.txt file. 

如果匹配相等

take the trip_id from the trips.txt file and compare it with the trip_id in the stop_times.txt file

如果这是匹配检查

stop_id is equal to the stop_id of the stops_file.txt file then print. Now the stop_id can be a number or a     string 

我要打印的是打印出这样的东西,例如:

route_id, trip_id, arrival_time, departure_time, stop_name
01,19417636, 14:40:00,14:40:00, Alewife Station 

非常感激

4

2 回答 2

0

我认为在这种情况下最简单的做法是将数据导入数据库并使用 SQL 连接。您可以只使用 sqlite3,这很简单。即使是内存数据库也可以工作,这取决于有多少数据以及脚本运行的频率。

确保为外键字段创建索引,否则查找速度可能会很慢。

此外,sqlite3 能够直接从 CSV 文件导入数据。只需创建表,然后使用“.import”命令(运行 sqlite3 并键入 .help 或查看文档)。

肖恩

于 2013-03-21T13:45:21.130 回答
0

您正在尝试做的事情称为join operation,并且可以使用pandas库很容易地完成:

import pandas as pd

routes = pd.read_csv('routes.txt')
trips = pd.read_csv('trips.txt')
stop_times = pd.read_csv('stop_times.txt')
stops = pd.read_csv('stops.txt')

您可能必须更改read_csv 的选项,以便它正确解释您的数据(尤其是route_ids 上的前导零)

#   Please excuse the Dr. Seuss variable names
routes_trips = pd.merge(routes, trips, on=['route_id'])
routes_trips_stop_times = pd.merge(routes_trips, stop_times, on=['trip_id'])
routes_trips_stop_times_names = pd.merge(routes_trips_stop_times, stops, on=['stop_id'])

默认情况下,pandas 执行内部连接,因此您最终只会得到匹配route_ids、trip_ids 和stop_ids 的那些行。

于 2013-03-21T02:20:43.370 回答