我正在尝试使用 Python 的 difflib 包中的 SequenceMatcher 方法来识别字符串相似性。不过,我遇到了这种方法的奇怪行为,我相信我的问题可能与包的“垃圾”过滤器有关,这里有详细描述的问题。我只想说我认为我可以通过difflib 文档中描述的方式将 autojunk 标志传递给我的 SequenceMatcher 来解决我的问题:
import difflib
def matches(s1, s2):
s = difflib.SequenceMatcher(None, s1, s2, autojunk=False)
match = [s1[i:i+n] for i, j, n in s.get_matching_blocks() if n > 0]
return match
print matches("they all are white a sheet of spotless paper when they first are born but they are to be scrawled upon and blotted by every goose quill", "you are all white a sheet of lovely spotless paper when you first are born but you are to be scrawled and blotted by every gooses quill")
但这会产生以下错误消息:
Traceback (most recent call last):
File "test3.py", line 8, in <module>
print matches("they all are white a sheet of spotless paper when they first are born but they are to be scrawled upon and blotted by every goose quill", "you are all white a sheet of lovely spotless paper when you first are born but you are to be scrawled and blotted by every gooses quill")
File "test3.py", line 4, in matches
s = difflib.SequenceMatcher(None, s1, s2, autojunk=False)
TypeError: __init__() got an unexpected keyword argument 'autojunk'
有谁知道我如何将 autojunk=False 标志传递给 SequenceMatcher?对于其他人可以提供的任何建议,我将不胜感激。