我尝试了一个简单的演示来检查 geograpy 是否可以做我正在寻找的事情:尝试在非规范化地址中找到国家名称和 iso 代码(这基本上就是 geograpy 的目的!)。
问题是,在我做的测试中,geograpy 能够为每个使用的地址找到几个国家,在大多数情况下包括正确的,但我找不到任何类型的参数来决定哪个国家最“正确” ”。
我使用的虚假地址列表可能反映了可以分析的现实,是这样的:
- John Doe 115 Huntington Terrace Newark, New York 07112 Stati Uniti
- John Doe 160 Huntington Terrace 纽瓦克, 纽约 07112 美国
- John Doe 30 Huntington Terrace Newark, New York 07112 USA
- 约翰·多伊 22 Huntington Terrace Newark, New York 07112 US
- Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italia
- Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italy
这是编写的简单代码:
import geograpy
ind = ["John Doe 115 Huntington Terrace Newark, New York 07112 Stati Uniti",
"John Doe 160 Huntington Terrace Newark, New York 07112 United States of America",
"John Doe 30 Huntington Terrace Newark, New York 07112 USA",
"John Doe 22 Huntington Terrace Newark, New York 07112 US",
"Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italia",
"Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italy"]
locator = geograpy.locator.Locator()
for address in ind:
places = geograpy.get_place_context(text=address)
print(address)
#print(places)
for country in places.countries:
print("Country:"+country+", IsoCode:"+locator.getCountry(name=country).iso)
print()
这是输出:
John Doe 115 Huntington Terrace Newark, New York 07112 Stati Uniti
Country:United Kingdom, IsoCode:GB
Country:Jamaica, IsoCode:JM
Country:United States, IsoCode:US
John Doe 160 Huntington Terrace Newark, New York 07112 United States of America
Country:United States, IsoCode:US
Country:United Kingdom, IsoCode:GB
Country:Netherlands, IsoCode:NL
Country:Jamaica, IsoCode:JM
Country:Argentina, IsoCode:AR
John Doe 30 Huntington Terrace Newark, New York 07112 USA
Country:United Kingdom, IsoCode:GB
Country:Jamaica, IsoCode:JM
Country:United States, IsoCode:US
John Doe 22 Huntington Terrace Newark, New York 07112 US
Country:United Kingdom, IsoCode:GB
Country:Jamaica, IsoCode:JM
Country:United States, IsoCode:US
Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italia
Country:Australia, IsoCode:AU
Country:Sweden, IsoCode:SE
Country:United States, IsoCode:US
Mario Bianchi, Via Nazionale 256, 00148 Roma (RM) Italy
Country:Italy, IsoCode:IT
Country:Australia, IsoCode:AU
Country:Sweden, IsoCode:SE
Country:United States, IsoCode:US
首先,最大的问题是在意大利地址(第 4 号)中找不到完全正确的国家(意大利/意大利),我不知道找到的三个国家来自哪里。
在大多数情况下,它找到了错误的国家,沉迷于正确的国家,而且我没有任何类型的关于置信度百分比、距离或我能理解的指标,如果一个国家可以被认为是可以接受的答案并且,在多个结果中,什么可能是“最好的”。
我想提前道歉,但我没有时间深入研究 geograpy3,我不知道这是否是一个愚蠢的问题,但我在文档中没有找到任何关于置信度/概率/距离的信息。