肽/蛋白质的计算机切割是一项典型的任务,并且已经实施。例如,您可以通过pyteomics
以下方式使用(我开发的):
In [1]: from pyteomics.parser import cleave, expasy_rules
In [2]: cleave('GGRGAGSAAWSAAVRYLTMMSSLYQT', expasy_rules['trypsin'])
Out[2]: {'GAGSAAWSAAVR', 'GGR', 'YLTMMSSLYQT'}
如您所见,流行的切割规则已经编码。但您可以提供自己的:
In [3]: cleave('GGRGAGSAAWSAAVRYLTMMSSLYQT', '[KR]?[^P].*?[KR](?!P)')
Out[3]: {'GAGSAAWSAAVR', 'GGR', 'YLTMMSSLYQT'}
这是教程中的相关部分。
如果您对它当前的实现方式感兴趣,请查看源代码:
@memoize()
def cleave(sequence, rule, missed_cleavages=0, overlap=False):
"""
Docstring omitted here for brevity.
"""
peptides = set()
cleavage_sites = deque([0], maxlen=missed_cleavages+2)
for i in chain(map(lambda x: x.end(), re.finditer(rule, sequence)),
[None]):
cleavage_sites.append(i)
for j in range(0, len(cleavage_sites)-1):
peptides.add(sequence[cleavage_sites[j]:cleavage_sites[-1]])
if overlap and i not in {0, None}:
peptides.update(
cleave(sequence[i:], rule, missed_cleavages, overlap))
if '' in peptides:
peptides.remove('')
return peptides