python - Python：比较两个字符串并返回它们共有的最长段

Question

作为 Python 的新手，我编写了一个工作函数，它将比较两个字符串并搜索两个字符串共享的最长子字符串。例如，当函数比较“goggle”和“google”时，它会将“go”和“gle”识别为两个常见的子字符串（不包括单个字母），但只会返回“gle”，因为它是最长的。

我想知道我的代码的任何部分是否可以改进/重写，因为它可能被认为是冗长而复杂的。我也很高兴看到解决方案的其他方法。提前致谢！

def longsub(string1, string2):
    sublist = []
    i=j=a=b=count=length=0

    while i < len(string1):
        while j < len(string2):
            if string1[i:a+1] == string2[j:b+1] and (a+1) <= len(string1) and (b+1) <= len(string2):
                a+=1
                b+=1
                count+=1
            else:
                if count > 0:
                    sublist.append(string1[i:a])
                count = 0
                j+=1
                b=j
                a=i
        j=b=0
        i+=1
        a=i

    while len(sublist) > 1:
        for each in sublist:
            if len(each) >= length:
                length = len(each)
            else:
                sublist.remove(each)

    return sublist[0]

编辑：比较“护目镜”和“谷歌”可能是一个不好的例子，因为它们的长度相同，最长的公共段位于相同的位置。实际输入会更接近于：“xabcdkejp”和“zkdieaboabcd”。正确的输出应该是“abcd”。

score 4 · Accepted Answer

4

实际上，标准库中恰好有一个函数： difflib.SequencMatcher.find_longest_match

于 2013-03-19T16:43:22.017 回答

score 2 · Accepted Answer

编辑：此算法仅在单词在相同索引中具有最长段时才有效

您只需一个循环即可逃脱。使用辅助变量。像这样的东西（需要重构）http://codepad.org/qErRBPav：

word1 = "google"
word2 = "goggle"

longestSegment = ""
tempSegment = ""

for i in range(len(word1)):
    if word1[i] == word2[i]:
        tempSegment += word1[i]
    else: tempSegment = ""

    if len(tempSegment) > len(longestSegment):
        longestSegment = tempSegment

print longestSegment # "gle"

编辑：mgilson 的使用建议find_longest_match（适用于不同位置的段）：

from difflib import SequenceMatcher

word1 = "google"
word2 = "goggle"

s = SequenceMatcher(None, word1, word2)
match = s.find_longest_match(0, len(word1), 0, len(word2))

print word1[match.a:(match.b+match.size)] # "gle"

python - Python：比较两个字符串并返回它们共有的最长段

2 回答 2

Related

Reference