我有类似的 DNA 序列
seq='ATCGTTTTTCGAAACTGCCCCCCACTGGGGA'
我想在 python 中打印连续重复核苷酸(如果它连续重复两次以上)。
对于这个序列输出应该是
TTTTT
AAA
CCCCCC
GGGG
我有类似的 DNA 序列
seq='ATCGTTTTTCGAAACTGCCCCCCACTGGGGA'
我想在 python 中打印连续重复核苷酸(如果它连续重复两次以上)。
对于这个序列输出应该是
TTTTT
AAA
CCCCCC
GGGG
你可能想看看itertools.groupby
。
一个示例用法:
for _, group in itertools.groupby(seq):
group = ''.join(group)
if len(group) > 2:
print group
regular expression
通过反向引用和方法,您可以很容易地找到重复findall
;
seq = 'ATCGTTTTTCGAAACTGCCCCCCACTGGGGA'
import re
hits = re.findall(r'(([A-Z])\2\2+)', seq) # regex matching all repeating A-Z groups
print [hit[0] for hit in hits] # Comprehension to filter the results
['TTTTT', 'AAA', 'CCCCCC', 'GGGG']
seq='ATCGTTTTTCGAAACTGCCCCCCACTGGGGA'
while len(seq) > 1:
value = seq[0]
repeats = 1
idx = 1
while 1:
if seq[idx] == value:
repeats += 1
else:
if repeats > 1: print value*repeats
seq = seq[repeats:]
break
idx += 1