0

我有这个字符串

str = CTGGCATAACAAGACAAAAACAAAAGCAATAAATGCTGAAAAAACAAAATGCCGTGATCGTTTGTAATACTGGAACATAGTCATGATGAATGAAGGTTTCTGAACCTGAAGAACGACCTGAAAAAGTCAAACCGCAAGAATATCACGACGCAGTGAACCAGAATAGCAACGACGAAAATGTCCAGGAAAAATCCTGGAGTCAGATTCAGGGTTATTCGTTAGTGGCAGGATTACGAAGCGTGGGGCACAGGAGATACATCTCCAGTAAGATGGCAACGTAATCGCGGGCTTCTTTTTTAAGATCAAAAGATTGCGGGGCAAAGAGCCAGTTTTCCATCAGGCCGGAAATATAGCCGCGCATAATAATTGCTGCGCGACGCGTCATTAAATCCGCAGGCAACATTTTCGCTTCAATACAATGTTTTAACGTTTGTTCTATACGGTCATAACTTTCCAGACAGAGATTACGTTGTGCCTGTTGCACAACAGCCATTTCTCCGACAAATTCGCATTTGTGGAATATAATCTCCATCAATAATCGACGCCGTTCTTCTGTCACCGTGGATTCAAGAACATGAATTAATATCTCTCTTAATACTGAGAGTGGATCGCCAGGGAATTTTGCCTGATACTCAAGCTCTAGTTCACCAATATTGGATTCTGACAGTTCCCAGATCTCACTGAACAAATCCGACTTGTCTTTAAAATGCCAGTAGATTGCACCGCGCGTAACGCCAGCTGCTTTTGCAATCTCGCCCAGCGAGGTGGATGATACCCCCTGCTGTGAGAAAAGACGTAGAGCCACATCGAGGATGTGTTGGCGCGTTTCTTGCGCTTCTTGTTTGGTTTTTCGTGCCATATGTTCGTGAATTTACAGGCGTTAGATTTACATACATTTGTGAATGTATGTACCATAGCACGACGATAATATAAACGCAGCAATGGGTTTATTAACTTTTGACCATTGACCAATTTGAAATCGGACACTCGAGGTTTACATA

我想把这个字符串切成多个子字符串,比如 str[0:19] str[1:20] str[2:21] str[3:22] ..... 等等。

4

3 回答 3

3

使用字符串切片:

>>> strs = "CTGGCATAACAAGACAAAAACAAAAGCAATAAATGCTGAAAAAACAAAATGCCGTGATCGTTTGTAATACTGGAACATAGTCATGATGAATGAAGGTTTCTGAACCTGAAGAACGACCTGAAAAAGTCAAACCGCAAGAATATCACGACGCAGTGAACCAGAATAGCAACGACGAAAATGTCCAGGAAAAATCCTGGAGTCAGATTCAGGGTTATTCGTTAGTGGCAGGATTACGAAGCGTGGGGCACAGGAGATACATCTCCAGTAAGATGGCAACGTAATCGCGGGCTTCTTTTTTAAGATCAAAAGATTGCGGGGCAAAGAGCCAGTTTTCCATCAGGCCGGAAATATAGCCGCGCATAATAATTGCTGCGCGACGCGTCATTAAATCCGCAGGCAACATTTTCGCTTCAATACAATGTTTTAACGTTTGTTCTATACGGTCATAACTTTCCAGACAGAGATTACGTTGTGCCTGTTGCACAACAGCCATTTCTCCGACAAATTCGCATTTGTGGAATATAATCTCCATCAATAATCGACGCCGTTCTTCTGTCACCGTGGATTCAAGAACATGAATTAATATCTCTCTTAATACTGAGAGTGGATCGCCAGGGAATTTTGCCTGATACTCAAGCTCTAGTTCACCAATATTGGATTCTGACAGTTCCCAGATCTCACTGAACAAATCCGACTTGTCTTTAAAATGCCAGTAGATTGCACCGCGCGTAACGCCAGCTGCTTTTGCAATCTCGCCCAGCGAGGTGGATGATACCCCCTGCTGTGAGAAAAGACGTAGAGCCACATCGAGGATGTGTTGGCGCGTTTCTTGCGCTTCTTGTTTGGTTTTTCGTGCCATATGTTCGTGAATTTACAGGCGTTAGATTTACATACATTTGTGAATGTATGTACCATAGCACGACGATAATATAAACGCAGCAATGGGTTTATTAACTTTTGACCATTGACCAATTTGAAATCGGACACTCGAGGTTTACATA"
>>> substrings = [strs[i:i+19] for i in xrange(len(strs))]
>>> substrings
['CTGGCATAACAAGACAAAA', 'TGGCATAACAAGACAAAAA', 'GGCATAACAAGACAAAAAC',...]
于 2013-06-14T17:18:10.813 回答
1

如果您搜索从链中提取 19 个核苷酸的所有序列,这将完成:

>>> SEQ_LEN = 20
>>> [strs[i:i+SEQ_LEN] for i in xrange(len(strs)-SEQ_LEN)]

但是,它的内存效率不是很高,因为 if 会生成所有子序列的列表。它是干什么用的?


处理 N 个核苷酸的每个子序列的另一种方法可能是:

for seq in (strs[i:i+SEQ_LEN] for i in xrange(len(strs)-SEQ_LEN)):
    do_something_with(seq)

对于您的具体问题,do_something_with主要是使用核苷酸位置更新 PWM。如果您对此有困难,请随时发布其他问题;)

于 2013-06-14T17:27:02.330 回答
1
chopped_str = []
for i in range(0, len(str)-19):
   chopped_str.append(str[i:i+19])
于 2013-06-14T17:18:02.253 回答