我有一个用于创建 FASTA 序列文件的程序。
input=['ARIMALTHNAEYSDSFTAL','ARIMFLTHNFEYSESFTAL','AHIMNPTENAEYHESFTAL','AHIMNPTENTEYWDSFTAL','AHIMNDTHNFEYHDSFTAL','AHIMNDTNNTEYWESFTAL','ARIMFDTENAEYHDSFTAL','AHIMADTNNTEYWDSFTAL','ARIMFLTENTEYHESFTAL']
l = len(input[0])
my_residues = [set() for _ in xrange(l)]
for h in input:
for i, x in enumerate(h):
my_residues[i].add(x)
my_residues = [list(x) for x in my_residues]
print my_residues
这将给出这样的输出
[['A'], ['H', 'R'], ['I'], ['M'], ['A', 'N', 'F'], ['P', 'L', 'D'], ['T'], ['H', 'E', 'N'], ['N'], ['A', 'T', 'F'], ['E'], ['Y'], ['H', 'S', 'W'], ['E', 'D'], ['S'], ['F'], ['T'], ['A'], ['L']]
但我希望以这样一种方式输出,如果它包含多个残基,则所有氨基酸残基都在一组中。所以输出应该是这样的:
[['A'], ['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y'], ['I'], ['M'], ['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y'], ['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y'], ['T'], ['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y'],...... ['F'], ['T'], ['A'], ['L']]