Longest Common Subsequence (LCS)
问题是:给定两个序列A
和,找到在和 中B
都找到的最长子序列。例如,给定和,最长公共子序列是。A
B
A = "peterparker"
B = "spiderman"
"pera"
有人可以解释这个Longest Common Subsequence
算法吗?
def longestCommonSubsequence(A: List, B: List) -> int:
# n = len(A)
# m = len(B)
indeces_A = collections.defaultdict(list)
# O(n)
for i, a in enumerate(A):
indeces_A[a].append(i)
# O(n)
for indeces_a in indeces_A.values():
indeces_a.reverse()
# O(m)
indeces_A_filtered = []
for b in B:
indeces_A_filtered.extend(indeces_A[b])
# The length of indeces_A_filtered is at most n*m, but in practice it's more like O(m) or O(n) as far as I can tell.
iAs = []
# O(m log m) in practice as far as I can tell.
for iA in indeces_A_filtered:
j = bisect.bisect_left(iAs, iA)
if j == len(iAs):
iAs.append(iA)
else:
iAs[j] = iA
return len(iAs)
所写的算法找到 的长度longest common subsequence
,但可以修改以找到longest common subsequence
完全的长度。
当我在 leetcode link上寻找一个等效问题的最快 python 解决方案时,我发现了这个算法。该算法是该问题最快的 Python 解决方案(40 毫秒),而且它似乎也具有时间复杂度,这比大多数其他解决方案O(m log m)
的时间复杂度要好得多。O(m*n)
我不完全理解它为什么会起作用,并尝试到处寻找已知算法来Longest Common Subsequence
解决问题以找到其他提及它的地方,但找不到类似的东西。我能找到的最接近的是Hunt–Szymanski algorithm
链接,据说O(m log m)
在实践中也有,但似乎不是相同的算法。
我的理解是:
indeces_a
被反转,以便在iAs
for 循环中保留较小的索引(这在执行下面的演练时更加明显。)- 据我所知,
iAs
for 循环找到了longest increasing subsequence
ofindeces_A_filtered
。
谢谢!
A = "peterparker"
这是算法的演练,例如B = "spiderman"
01234567890
A = "peterparker"
B = "spiderman"
indeces_A = {'p':[0,5], 'e':[1,3,9], 't':[2], 'r':[4,7,10], 'a':[6], 'k':[8]}
# after reverse
indeces_A = {'p':[5,0], 'e':[9,3,1], 't':[2], 'r':[10,7,4], 'a':[6], 'k':[8]}
# -p- --e-- ---r-- a
indeces_A_filtered = [5,0, 9,3,1, 10,7,4, 6]
# the `iAs` loop
iA = 5
j = 0
iAs = [5]
iA = 0
j = 0
iAs = [0]
iA = 9
j = 1
iAs = [0,9]
iA = 3
j = 1
iAs = [0,3]
iA = 1
j = 1
iAs = [0,1]
iA = 10
j = 2
iAs = [0,1,10]
iA = 7
j = 2
iAs = [0,1,7]
iA = 4
j = 2
iAs = [0,1,4]
iA = 6
j = 3
iAs = [0,1,4,6] # corresponds to indices of A that spell out "pera", the LCS
return len(iAs) # 4, the length of the LCS