我需要为以下术语解析 FASTA 标题:叶子、芽、茎和嫩芽,如果一个序列包含任何一个术语,那么我打开一个文件并使用 Biopython 将其放置在那里。
所以我使用以下方法将它们转换为字典SeqIO.to_dict
:
from Bio import SeqIO
records_dict = SeqIO.to_dict(SeqIO.parse("my_example.fasta","fasta"))
但现在我不知道如何从标题中获取条款。序列如下所示:
>gi|393741877|gb|FS945568.1|FS945568 FS945568 tea plant lateral roots cDNA library Camellia sinensis cDNA clone LR29G09, mRNA sequence
CCGGGGATCCATTCCAAAATTCATCATAAACCTCTCAATATTGTTCACTTGAAAAAAGATGA...
>gi|393741878|gb|FS945569.1|FS945569 FS945569 tea plant lateral roots cDNA library Camellia sinensis cDNA clone LR29G11, mRNA sequence
CCGGGGGCTATCGAGCACTCACCGACTCACTCGAGAGCTAATACAGTCCACAGC...
>gi|393751846|gb|FS959695.1|FS959695 FS959695 tea plant young leaves cDNA library Camellia sinensis cDNA clone YL16A05, mRNA sequence
CCAACAACTTCTTCCTAACACTACCACCTTCTGTCAACTTACTTCTCCAAAGGCTTCTTTCTTCCACCAT
GGCTGCTTCTACCATGGCTCTCTCTTCCCCATCTTTCGCCGGAAAGGCGGTGAAACTTGCCCCGGAG...
>gi|393751847|gb|FS959696.1|FS959696 FS959696 tea plant young leaves cDNA library Camellia sinensis cDNA clone YL16A06, mRNA sequence
GAAACTGCATATAGAAAATCTCACTACCACTCTCTTCCTCTTCCTCTCTATCTTTCCTACCAAAGAAAG...
>gi|393750830|gb|FS956287.1|FS956287 FS956287 tea plant terminal buds cDNA library Camellia sinensis cDNA clone TB26G04, mRNA sequence
AGGATCGCACGGCCTTTGTGCCGGCGACGCATCATTCAAATTTCTGCCCTATCAACTTTCGATGGTAGGA
TAGT...
>gi|393750831|gb|FS956288.1|FS956288 FS956288 tea plant terminal buds cDNA library Camellia sinensis cDNA clone TB26G05, mRNA sequence
TCCCACAAACATGTTGCTCTCATCTTTCCAGTAAAAGATAGAGAGAGAGAGAGAGAGAACAAAGCAG...