对调整没有任何想法,我首先假设我有一个名称及其频率列表,然后构造一个字典,将前缀映射到具有该前缀的一组名称,然后将每个集合转换为仅包含前 5 个名称的列表频率。
使用从这里推导出来的男孩名字列表来创建一个文本文件,其中每一行都是一个整数出现频率,一些空格,然后是这样的名字:
8427 OLIVER
7031 JACK
6862 HARRY
5478 ALFIE
5410 CHARLIE
5307 THOMAS
5256 WILLIAM
5217 JOSHUA
4542 GEORGE
4351 JAMES
4330 DANIEL
4308 JACOB
...
以下代码构造字典:
from collections import defaultdict
MAX_SUGGEST = 5
def gen_autosuggest(name_freq_file_name):
with open(name_freq_file_name) as f:
name2freq = {}
for nf in f:
freq, name = nf.split()
if name not in name2freq:
name2freq[name] = int(freq)
pre2suggest = defaultdict(list)
for name, freq in sorted(name2freq.items(), key=lambda x: -x[1]):
# in decreasing order of popularity
for i, _ in enumerate(name, 1):
prefix = name[:i]
pre2suggest[prefix].append((name, name2freq[name]))
# set max suggestions
return {pre:namefs[:MAX_SUGGEST]
for pre, namefs in pre2suggest.items()}
if __name__ == '__main__':
pre2suggest = gen_autosuggest('2010boysnames_popularity_engwales2.txt')
如果你给字典你的前缀,那么它会返回你的建议(在这种情况下连同它们的频率,但如果需要,可以丢弃这些建议:
>>> len(pre2suggest)
15303
>>> pre2suggest['OL']
[('OLIVER', 8427), ('OLLIE', 1130), ('OLLY', 556), ('OLIVIER', 175), ('OLIWIER', 103)]
>>> pre2suggest['OLI']
[('OLIVER', 8427), ('OLIVIER', 175), ('OLIWIER', 103), ('OLI', 23), ('OLIVER-JAMES', 16)]
>>>
看没有尝试:-)
时间杀手
如果运行需要很长时间,那么您可以预先计算 dict 并将其保存到文件中,然后在需要时使用 pickle 模块加载预先计算的值:
>>> import pickle
>>>
>>> savename = 'pre2suggest.pcl'
>>> with open(savename, 'wb') as f:
pickle.dump(pre2suggest, f)
>>> # restore it
>>> with open(savename, 'rb') as f:
p2s = pickle.load(f)
>>> p2s == pre2suggest
True
>>>