此示例适用于小型 url 列表,但不能很好地扩展。
def longest_prefix_match(search, urllist):
matches = [url for url in urllist if url.startswith(search)]
if matches:
return max(matches, key=len)
else:
raise Exception("Not found")
使用trie模块的实现。
import trie
def longest_prefix_match(prefix_trie, search):
# There may well be a more elegant way to do this without using
# "hidden" method _getnode.
try:
return list(node.value for node in prefix_trie._getnode(search).walk())
except KeyError:
return list()
url_list = [
'http://www.google.com/mail',
'http://www.google.com/document',
'http://www.facebook.com',
]
url_trie = trie.Trie()
for url in url_list:
url_trie[url] = url
searches = ("http", "http://www.go", "http://www.fa", "http://fail")
for search in searches:
print "'%s' ->" % search, longest_prefix_match(url_trie, search)
结果:
'http' -> ['http://www.facebook.com', 'http://www.google.com/document', 'http://www.google.com/mail']
'http://www.go' -> ['http://www.google.com/document', 'http://www.google.com/mail']
'http://www.fa' -> ['http://www.facebook.com']
'http://fail' -> []
或使用PyTrie给出相同的结果,但列表的顺序不同。
from pytrie import StringTrie
url_list = [
'http://www.google.com/mail',
'http://www.google.com/document',
'http://www.facebook.com',
]
url_trie = StringTrie()
for url in url_list:
url_trie[url] = url
searches = ("http", "http://www.go", "http://www.fa", "http://fail")
for search in searches:
print "'%s' ->" % search, url_trie.values(prefix=search)
从内存使用的角度来看,我开始认为基数树/帕特里夏树会更好。这就是基数树的样子:
而特里看起来更像: