最后我做了自己的Python函数,觉得不妨分享一下。
它需要一个有缺口的比对肽序列和相应的未比对的核苷酸序列,并给出一个比对的核苷酸序列:
功能
def gapsFromPeptide( peptide_seq, nucleotide_seq ):
""" Transfers gaps from aligned peptide seq into codon partitioned nucleotide seq (codon alignment)
- peptide_seq is an aligned peptide sequence with gaps that need to be transferred to nucleotide seq
- nucleotide_seq is an un-aligned dna sequence whose codons translate to peptide seq"""
def chunks(l, n):
""" Yield successive n-sized chunks from l."""
for i in xrange(0, len(l), n):
yield l[i:i+n]
codons = [codon for codon in chunks(nucleotide_seq,3)] #splits nucleotides into codons (triplets)
gappedCodons = []
codonCount = 0
for aa in peptide_seq: #adds '---' gaps to nucleotide seq corresponding to peptide
if aa!='-':
gappedCodons.append(codons[codonCount])
codonCount += 1
else:
gappedCodons.append('---')
return(''.join(gappedCodons))
用法
>>> unaligned_dna_seq = 'ATGATGATG'
>>> aligned_peptide_seq = 'M-MM'
>>> aligned_dna_seq = gapsFromPeptide(aligned_peptide_seq, unaligned_dna_seq)
>>> print(aligned_dna_seq)
ATG---ATGATG