我有一个巨大的人名列表,我必须在一个巨大的文本中搜索。
只有部分名称可能出现在文本中。并且有可能拼写错误、输入错误或缩写。文本没有标记,所以我不知道文本中人名的开始位置。而且我不知道这个名字是否会出现在文本中。
例子:
我的列表中有“Barack Hussein Obama”,因此我必须检查以下文本中是否出现了该名称:
- ...The candidate Barack Obama was elected the president of the United States... (incomplete)
- ...The candidate Barack Hussein was elected the president of the United States... (incomplete)
- ...The candidate Barack HO was elected the president of the United States... (abbreviated)
- ...The candidate Barack ObaNa was elected the president of the United States... (misspelled)
- ...The candidate Barack OVama was elected the president of the United States... (misstyped, B is next to V)
- ......候选人约翰麦凯恩在选举中失败......(没有出现奥巴马的名字)
当然,它没有确定性的解决方案,但是......
对于这种搜索,什么是好的启发式方法?
如果必须,你会怎么做?