python - 在两个 Python 列表中查找公共切片

Question

我想在两个 Python 列表中找到常见的切片。

例如：

list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

list2 = [0, 0, 3, 4, 5, 0, 0, 8, 9, 0]

应该返回两个列表： [3, 4, 5] 和 [8,9]

可以有任何数字或字符来代替 0。

score 2 · Accepted Answer

使用difflib.SequenceMatcher：

>>> import difflib
>>> list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list2 = [0, 0, 3, 4, 5, 0, 0, 8, 9, 0]
>>> matcher = difflib.SequenceMatcher(a=list1, b=list2)
>>> match = matcher.find_longest_match(0, len(list1), 0, len(list2))
>>> match
Match(a=2, b=2, size=3)
>>> print list1[match.a:match.a+match.size]
[3, 4, 5]

SequenceMatcher.find_longest_match() 为其每个序列（alo、ahi、blo、bhi）获取开始和结束索引，因此在找到匹配项后，您可以调用find_longest_match()同一个matcher对象，但调整参数以便开始查看前一个匹配项。

您可以循环执行此操作，我会编写一个函数来执行此操作，如下所示：

import difflib
def common_slices(a, b):
    matcher = difflib.SequenceMatcher(a=a, b=b)
    sa, sb, size = matcher.find_longest_match(0, len(a), 0, len(b))
    while size != 0:
        if size > 1:
            yield a[sa:sa+size]
        sa, sb, size = matcher.find_longest_match(sa+size, len(a), sb+size, len(b))

>>> list(common_slices(list1, list2))
[[3, 4, 5], [8, 9]]

score 2 · Accepted Answer

>>> from itertools import groupby
>>> from operator import itemgetter
>>> list1
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list2
[0, 0, 3, 4, 5, 0, 0, 8, 9, 0]
>>> [[e[0] for e in v]
     for k,v in groupby(((a ,b, a==b)
             for a,b in zip(list1, list2)), itemgetter(2))
      if k]
[[3, 4, 5], [8, 9]]

如果您不想按照@FJ 的建议使用 difflib，您应该以这种方式使用

>>> [list1[match.a: match.a + match.size]
     for match in SequenceMatcher(None,list1,list2).get_matching_blocks()[:-1]]

但请记住，这将远低于之前的线性解决方案

python - 在两个 Python 列表中查找公共切片

2 回答 2

Related

Reference