-4

我有类似的 DNA 序列

seq='ATCGTTTTTCGAAACTGCCCCCCACTGGGGA'

我想在 python 中打印连续重复核苷酸(如果它连续重复两次以上)。

对于这个序列输出应该是

TTTTT
AAA
CCCCCC
GGGG
4

3 回答 3

4

你可能想看看itertools.groupby

一个示例用法:

for _, group in itertools.groupby(seq):
    group = ''.join(group)
    if len(group) > 2:
        print group
于 2012-10-23T04:26:21.250 回答
1

regular expression通过反向引用和方法,您可以很容易地找到重复findall

seq = 'ATCGTTTTTCGAAACTGCCCCCCACTGGGGA'

import re
hits = re.findall(r'(([A-Z])\2\2+)', seq) # regex matching all repeating A-Z groups
print [hit[0] for hit in hits]          # Comprehension to filter the results

['TTTTT', 'AAA', 'CCCCCC', 'GGGG']
于 2012-10-23T04:44:08.347 回答
0
seq='ATCGTTTTTCGAAACTGCCCCCCACTGGGGA'
while len(seq) > 1:
    value = seq[0]
    repeats = 1
    idx = 1
    while 1:
        if seq[idx] == value:
            repeats += 1
        else:
            if repeats > 1: print value*repeats
            seq = seq[repeats:]
            break
        idx += 1
于 2012-10-23T04:52:52.853 回答