我有 2 个序列文件。说 ham1.txt :
AAACCCTTTGGG
AGGTACTTTTTT
TCTCTTTTTTTT
等等
火腿2.txt:
AAACCCTTTGGG
GAGAGGGAGGGC
AGGTACTTTTTT
CTCTTAATTTCC
TCTCTTTTTTTT
GTTTTTAAAAAA
我想将 ham1.txt 中的序列与 ham2.txt 中的序列匹配,具体取决于哪对具有最小汉明距离。我的python代码打印了它们之间的汉明距离。我只想要最合适的一对。这是我的代码
def hamming_distance(s1, s2):
#Return the Hamming distance between equal-length sequences
if len(s1) != len(s2):
raise ValueError("Undefined for sequences of unequal length")
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
with open('ham1.txt','r') as file1:
for s1 in file1:
with open('ham2.txt','r') as file2:
for s2 in file2:
dist = hamming_distance(s1,s2)
print s1,s2,dist
你能建议编辑吗?谢谢