上次我的问题是, (如何使用正则表达式获取方括号之间的内容?)
#start gene g1
dog1
dog2
dog3
#protein sequence = [DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD]
#end gene g1
###
#start gene g2
cat1
cat2
cat3
#protein sequence = [CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
#CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC]
#end gene g2
###
#start gene g3
pig1
pig2
pig3
...
我想获取括号之间的内容并制作名为 50267.fa 的新文件,如下所示
>g1_50267
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
>g2_50267
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
...
我得到了这样的答案,
import re
with open("50267.gff", "r") as ff:
matches = re.findall(r'\[([^\]]+)', ff.read())
matches = ['>g' + str(ind+1) + "_50267\n" + x.replace('\n#', ' ') for ind, x in enumerate(matches)]
#print(matches)
with open('50267.fa', 'w') as fa:
fa.write("\n".join(matches))
当我尝试使用该代码时,它运行良好。但我不明白下面的代码是什么意思
r'\[([^\]]+)'
x in enumerate(matches)