我正在对 3 个单词进行迭代,每个单词长约 500 万个字符,我想找到识别每个单词的 20 个字符的序列。也就是说,我想在一个单词中找到该单词唯一的所有长度为 20 的序列。我的问题是我编写的代码需要很长时间才能运行。我什至连一个单词都没有完成运行我的程序。
下面的函数获取一个包含字典的列表,其中每个字典包含 20 个可能的单词及其来自 500 万个长单词之一的位置。
如果有人知道如何优化它,我将非常感激,我不知道如何继续......
这是我的代码示例:
def findUnique(list):
# Takes a list with dictionaries and compairs each element in the dictionaries
# with the others and puts all unique element in new dictionaries and finally
# puts the new dictionaries in a list.
# The result is a list with (in this case) 3 dictionaries containing all unique
# sequences and their locations from each string.
dicList=[]
listlength=len(list)
s=0
valuelist=[]
for i in list:
j=i.values()
valuelist.append(j)
while s<listlength:
currdic=list[s]
dic={}
for key in currdic:
currval=currdic[key]
test=True
n=0
while n<listlength:
if n!=s:
if currval in valuelist[n]: #this is where it takes to much time
n=listlength
test=False
else:
n+=1
else:
n+=1
if test:
dic[key]=currval
dicList.append(dic)
s+=1
return dicList