我在这里使用了 python 包 ahocorasick( https://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ ) 来匹配州名的文本:
import ahocorasick
states = {
'AK': 'Alaska',
'AL': 'Alabama',
'AR': 'Arkansas',
'AS': 'American Samoa',
'AZ': 'Arizona',
'CA': 'California',
'CO': 'Colorado',
'CT': 'Connecticut'
}
def LoadKeywords(keywords):
#Keyword should be a list
tree = ahocorasick.KeywordTree()
for k in keywords:
tree.add(k)
tree.make()
return tree
keywordLong = states.values();
keywordLongTree = LoadKeywords(keywordLong);
然后我尝试进行搜索
keywordLongTree.search("Alabama")
它返回
(0, 7)
这是好的和合法的,但是当我这样做时
keywordLongTree.search("I don't know why this happen")
它应该返回一个 NONE 对象,但它返回:
(145331, 145335)
以前有人遇到过这种情况吗?为什么会这样?