您可以看到它认为是匹配块:
>>> difflib.SequenceMatcher(isjunk=lambda x: x == " ", a="a b c", b="a bc").get_matching_blocks()
[Match(a=0, b=0, size=3), Match(a=4, b=3, size=1), Match(a=5, b=4, size=0)]
前两个告诉你它匹配“a b”到“a b”和“c”到“c”。(最后一个是微不足道的)
问题是为什么可以匹配“a b”。我在代码中找到了答案。首先,算法通过重复调用 find_longest_match 找到一堆匹配块。find_longest_match 值得注意的是它允许垃圾字符存在于字符串的末尾:
If isjunk is defined, first the longest matching block is
determined as above, but with the additional restriction that no
junk element appears in the block. Then that block is extended as
far as possible by matching (only) junk elements on both sides. So
the resulting block never matches on junk except as identical junk
happens to be adjacent to an "interesting" match.
这意味着首先它认为“a”和“b”是匹配的(允许在“a”末尾和“b”开头的空格字符)。
然后是有趣的部分:代码最后一次检查是否有任何块相邻,如果是则合并它们。请参阅代码中的此注释:
# It's possible that we have adjacent equal blocks in the
# matching_blocks list now. Starting with 2.5, this code was added
# to collapse them.
所以基本上它匹配“a”和“b”,然后将这两个块合并为“a b”并称之为匹配,尽管空格字符是垃圾。