我正在尝试fuzzywuzzy,并遇到了很多情况下它会产生错误的结果。我尝试调试并遇到了一个难以解释的 get_matching_blocks() 场景。
我对 get_matching_blocks() 的理解是,它应该返回一个三元组 (i,j,n),其中n
索引处第一个字符串中长度的子字符串应与索引 处第二个字符串i
中长度的子字符串完全匹配n
j.
>>> hay = """"Find longest matching block in a[alo:ahi] and b[blo:bhi]. If isjunk was omitted or None, find_longest_match() returns (i, j, k) such that a[i:i+k] is equal to b[j:j+k], where alo <= i <= i+k <= ahi and blo <= j <= j+k <= bhi. For all (i', j', k') meeting those conditions, the additional conditions k >= k', i <= i', and if i == i', j <= j' are also met. In other words, of all maximal matching blocks, return one that starts earliest in a, and of all those maximal matching blocks that start earliest in a, return the one that starts earliest in b."""
>>> needle = "meeting those conditions"
>>> needle in hay
True
>>> sm = difflib.SequenceMatcher(None,needle,hay)
>>> sm.get_matching_blocks()
[Match(a=5, b=8, size=2), Match(a=24, b=550, size=0)]
>>>
那么为什么上面的代码找不到匹配的块呢?