感谢用户 Petri,我有一个 CSV 到 JSON Python 脚本,让我可以将 Geonames CSV 转储转换为 MongoImport 友好的 JSON。
问题是 Geonames 有一个名为的字段,该字段alternatenames
当前被引用并视为一个长字符串。因此无法在 MongoDB 中正确查询。我想将该字段更改为字符串数组,例如:"alternatenames":["name1", "name2"]
Python 脚本如下所示:
import csv, simplejson, decimal, codecs
data = open("cities.txt")
reader = csv.DictReader(data, delimiter=",", quotechar='"')
with codecs.open("cities.json", "w", encoding="utf-8") as out:
for r in reader:
for k, v in r.items():
# make sure nulls are generated
if not v:
r[k] = None
# parse and generate decimal arrays
elif k == "loc":
r[k] = [decimal.Decimal(n) for n in v.strip("[]").split(",")]
# generate a number
elif k == "geonameid":
r[k] = int(v)
out.write(simplejson.dumps(r, ensure_ascii=False, use_decimal=True)+"\n")
我的 CSV 有以下字段:
"geonameid","name","asciiname","alternatenames","loc","feature_class","feature_code","country_code","cc2","admin1_code","admin2_code","admin3_code","admin4_code"
3,"Zamīn Sūkhteh","Zamin Sukhteh","Zamin Sukhteh,Zamīn Sūkhteh","[48.91667,32.48333]","P","PPL","IR",,"15",,,
5,"Yekāhī","Yekahi","Yekahi,Yekāhī","[48.9,32.5]","P","PPL","IR",,"15",,,
7,"Tarvīḩ ‘Adāī","Tarvih `Adai","Tarvih `Adai,Tarvīḩ ‘Adāī","[48.2,32.1]","P","PPL","IR",,"15",,,
我当前的 JSON 输出如下所示:
{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Zamin Sukhteh,Zamīn Sūkhteh", "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Yekahi,Yekāhī", "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Tarvih `Adai,Tarvīḩ ‘Adāī", "asciiname": "Tarvih `Adai", "admin4_code": null}
我想更改 JSON 输出以添加一个字符串数组,如下所示(向右滚动到alternatenames
):
{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Zamin Sukhteh", "Zamīn Sūkhteh"], "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Yekahi,Yekāhī"], "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Tarvih `Adai", "Tarvīḩ ‘Adāī"], "asciiname": "Tarvih `Adai", "admin4_code": null}
另外,我应该将我quotechar
的 Access 2010 导出的 CSV 更改为^
而不是"
避免双引号吗?
谢谢你的帮助。