我有以下程序已经运行了大约两个小时,可能还有 1/4 的时间。我的问题在代码下方:
import csv
input_csv = "LOCATION_ID.csv"
input2 = "CITIES.csv"
output_csv = "OUTPUT_CITIES.csv"
with open(input_csv, "rb") as infile:
input_fields = ("ID", "CITY_DECODED", "CITY", "STATE", "COUNTRY", "SPELL1", "SPELL2", "SPELL3")
reader = csv.DictReader(infile, fieldnames = input_fields)
with open(input2, "rb") as infile2:
input_fields2 = ("Latitude", "Longitude", "City")
reader2 = csv.DictReader(infile2, fieldnames = input_fields2)
next(reader2)
words = []
for next_row in reader2:
words.append(next_row["City"])
with open(output_csv, "wb") as outfile:
output_fields = ("EXISTS","ID", "CITY_DECODED", "CITY", "STATE", "COUNTRY", "SPELL1", "SPELL2", "SPELL3")
writer = csv.DictWriter(outfile, fieldnames = output_fields)
writer.writerow(dict((h,h) for h in output_fields))
next(reader)
for next_row in reader:
search_term = next_row["CITY_DECODED"]
#I think the problem is here where I run through every city
#in "words", even though all I want to know is if the city
#in "search_term" exists in "words
for item in words:
if search_term in words:
next_row["EXISTS"] = 1
writer.writerow(next_row)
我在这里有几个问题:
1 鉴于 input_csv 有 14k 行,而 input2 只有 6k 行,为什么要花这么长时间?我知道最里面的 for 循环(以“for item in words:”开头)效率低下(参见 qtn 3),但我希望能更直观地了解幕后发生的事情,以便我(并希望其他 SO 用户)可以避免在我们的其他程序上犯同样的错误。
2如果我希望此代码继续运行,这与我离开计算机并进入睡眠/休眠状态有何关系?程序是否会在此时停止,但在计算机再次使用时会自行重新启动?我真的很想知道一旦运行程序的编译器如何与操作系统交互,以及与 python 程序相关的计算机“进入睡眠状态”意味着什么。
和 3此代码的更有效实现是什么?我认为这样做不应该花费超过几分钟的时间,对吗?
非常感谢!