我有这段代码,它计算随机字符串之间的最长公共子序列,以查看如何准确地重建输入的未知区域。为了获得好的统计数据,我需要对其进行多次迭代,但我当前的 python 实现太慢了。即使使用pypy,它目前也需要 21 秒才能运行一次,理想情况下我希望运行 100 次。
#!/usr/bin/python
import random
import itertools
#test to see how many different unknowns are compatible with a set of LCS answers.
def lcs(x, y):
n = len(x)
m = len(y)
# table is the dynamic programming table
table = [list(itertools.repeat(0, n+1)) for _ in xrange(m+1)]
for i in range(n+1): # i=0,1,...,n
for j in range(m+1): # j=0,1,...,m
if i == 0 or j == 0:
table[i][j] = 0
elif x[i-1] == y[j-1]:
table[i][j] = table[i-1][j-1] + 1
else:
table[i][j] = max(table[i-1][j], table[i][j-1])
# Now, table[n, m] is the length of LCS of x and y.
return table[n][m]
def lcses(pattern, text):
return [lcs(pattern, text[i:i+2*l]) for i in xrange(0,l)]
l = 15
#Create the pattern
pattern = [random.choice('01') for i in xrange(2*l)]
#create text start and end and unknown.
start = [random.choice('01') for i in xrange(l)]
end = [random.choice('01') for i in xrange(l)]
unknown = [random.choice('01') for i in xrange(l)]
lcslist= lcses(pattern, start+unknown+end)
count = 0
for test in itertools.product('01',repeat = l):
test=list(test)
testlist = lcses(pattern, start+test+end)
if (testlist == lcslist):
count += 1
print count
我尝试将它转换为 numpy,但我一定做得很糟糕,因为它实际上运行得更慢。这段代码能以某种方式加速吗?
更新。在下面的评论之后,如果直接使用循环会更好,它会在相同长度的所有子列表lcses
之间给出 LCS 。是否有可能以某种方式修改经典的动态编程 LCS 算法来做到这一点?pattern
text