我尝试了一个简单的演示来检查 geograpy 是否可以做我正在寻找的事情:尝试在非规范化地址中找到国家名称和 iso 代码(这基本上就是 geograpy 的目的!)。
问题是,在我做的测试中,geograpy 能够为每个使用的地址找到几个国家,在大多数情况下包括正确的,但我找不到任何类型的参数来决定哪个国家最“正确” ”。
- John Doe 115 Huntington Terrace Newark, New York 07112 Stati Uniti
- John Doe 160 Huntington Terrace 纽瓦克, 纽约 07112 美国
- John Doe 30 Huntington Terrace Newark, New York 07112 USA
- 约翰·多伊 22 Huntington Terrace Newark, New York 07112 US
- Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italia
- Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italy
import geograpy
ind = ["John Doe 115 Huntington Terrace Newark, New York 07112 Stati Uniti",
"John Doe 160 Huntington Terrace Newark, New York 07112 United States of America",
"John Doe 30 Huntington Terrace Newark, New York 07112 USA",
"John Doe 22 Huntington Terrace Newark, New York 07112 US",
"Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italia",
"Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italy"]
locator = geograpy.locator.Locator()
for address in ind:
places = geograpy.get_place_context(text=address)
for country in places.countries:
print("Country:"+country+", IsoCode:"+locator.getCountry(name=country).iso)
John Doe 115 Huntington Terrace Newark, New York 07112 Stati Uniti
Country:United Kingdom, IsoCode:GB
Country:Jamaica, IsoCode:JM
Country:United States, IsoCode:US
John Doe 160 Huntington Terrace Newark, New York 07112 United States of America
Country:United States, IsoCode:US
Country:United Kingdom, IsoCode:GB
Country:Netherlands, IsoCode:NL
Country:Jamaica, IsoCode:JM
Country:Argentina, IsoCode:AR
John Doe 30 Huntington Terrace Newark, New York 07112 USA
Country:United Kingdom, IsoCode:GB
Country:Jamaica, IsoCode:JM
Country:United States, IsoCode:US
John Doe 22 Huntington Terrace Newark, New York 07112 US
Country:United Kingdom, IsoCode:GB
Country:Jamaica, IsoCode:JM
Country:United States, IsoCode:US
Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italia
Country:Australia, IsoCode:AU
Country:Sweden, IsoCode:SE
Country:United States, IsoCode:US
Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italy
Country:Italy, IsoCode:IT
Country:Australia, IsoCode:AU
Country:Sweden, IsoCode:SE
Country:United States, IsoCode:US
首先,最大的问题是在意大利地址(第 4 号)中找不到完全正确的国家(意大利/意大利),我不知道找到的三个国家来自哪里。
我想提前道歉,但我没有时间深入研究 geograpy3,我不知道这是否是一个愚蠢的问题,但我在文档中没有找到任何关于置信度/概率/距离的信息。