我想找到 1)字符串中最常见的所有模式 2)最多有 d 不匹配。
对于这个给定的任务,我已经实现了计算给定模式在具有 d 不匹配的字符串中出现的次数的函数。该算法的思想基于使用字符串子模式的位掩码和给定模式的位掩码的卷积。它产生正确的结果。这里是这个算法的代码:
def create_bit_mask(letter, text):
buf_array=[]
for c in text:
if c==letter:
buf_array.append(1)
else:
buf_array.append(0)
return buf_array
def convolution(bit_mask1, bit_mask2):
return sum(b*q for b,q in zip(bit_mask1, bit_mask2))
def number_of_occurances_with_at_most_hamming_distance(genome,pattern,hamming_distance):
alphabet=["A","C","G","T"]
matches=0
matrix_of_bit_arrays_for_pattern=[]
matrix_of_bit_arrays_for_genome=[]
buf_output=0
buf=0
positions=""
for symbol in alphabet:
matrix_of_bit_arrays_for_pattern.append(create_bit_mask(symbol,pattern))
matrix_of_bit_arrays_for_genome.append(create_bit_mask(symbol, genome))
for i in xrange(len(genome)-len(pattern)+1):
buf_debug=[]
buf=sum(convolution(bit_mask_pattern,bit_mask_genome[i:i+len(pattern)]) for bit_mask_pattern, bit_mask_genome in zip(matrix_of_bit_arrays_for_pattern,matrix_of_bit_arrays_for_genome))
hamming=len(pattern)-buf
if hamming<=hamming_distance:
buf_output+=1
#print "current window: ", genome[i:i+len(pattern)], "pattern :", pattern,"number of mismatches : ", hamming, " @ position : ",i
return buf_output
鉴于上述功能,该任务的解决方案应该相当简单,即
def fuzzy_frequency():
genome="CACAGTAGGCGCCGGCACACACAGCCCCGGGCCCCGGGCCGCCCCGGGCCGGCGGCCGCCGGCGCCGGCACACCGGCACAGCCGTACCGGCACAGTAGTACCGGCCGGCCGGCACACCGGCACACCGGGTACACACCGGGGCGCACACACAGGCGGGCGCCGGGCCCCGGGCCGTACCGGGCCGCCGGCGGCCCACAGGCGCCGGCACAGTACCGGCACACACAGTAGCCCACACACAGGCGGGCGGTAGCCGGCGCACACACACACAGTAGGCGCACAGCCGCCCACACACACCGGCCGGCCGGCACAGGCGGGCGGGCGCACACACACCGGCACAGTAGTAGGCGGCCGGCGCACAGCC"
length=10
hamming_distance=2
for i in range(len(genome)-length):
#print genome[i:i+10], " number of occurances: ", number_of_occurances_with_at_most_hamming_distance(genome, genome[i:i+10],hamming_distance)
print (genome[i:i+length],number_of_occurances_with_at_most_hamming_distance(genome, genome[i:i+length],hamming_distance))
但是,上面的代码没有找到以下子字符串:
GCACACAGAC
你能打我吗,为什么?我不希望您发布正确的代码,而是给我一个想法,我的错误在哪里(我认为错误可能在第二个函数中)。
PS 我确实意识到我必须在 Stepic 在线课程上解决以下任务,但是没有来自学习组的在线社团的反馈,我已经在 StackOverflow 上发布了我的代码。