python - 使用 biopython 仅下载部分 genbank 文件

Question

我是 Biopython 新手，在解析 genbank 文件时遇到性能问题。

我必须解析很多 gb 文件，从中我有入藏号。解析后，我只想检查文件的分类和细胞器。现在，我有这个代码：

from Bio import SeqIO
from Bio import Entrez
gb_acc1 = Entrez.efetch(db='nucleotide', id=access1, rettype='gb', retmode='text')   #Where access1 contents the accession number
rec = SeqIO.read(gb_acc1, 'genbank')
cache[access1] = rec   #where cache is just a dictionary where saving the gb files already downloaded
feat = cache[access1].features[0]   
if 'organelle' in feat.qualifiers.keys(): #And the code goes on

为了寻找分类法，我有：

gi_h = Entrez.efetch(db='nucleotide', id=access, rettype='gb', retmode='text')
    gi_rec = SeqIO.read(gi_h, 'genbank')
    cache[access]=gi_rec
    if cache[access].annotations['taxonomy'][1] == 'Fungi':
                                fungi += 1 #And the code goes on

这（整个脚本）工作正常。我的问题是我下载整个 gb 文件（有时很大）只是为了查看这两个特征：细胞器和分类。如果我只能下载 gb 文件的这一部分，我的脚本会快得多，但我还没有弄清楚这是否可能。

有人知道这是否可以做到，如果可以，怎么做？非常感谢提前

score 1 · Accepted Answer

您可以使用seq_start 和 seq_stop截断您的序列，然后像以前一样解析它，例如

gb_acc1 = Entrez.efetch(db='nuccore', id=access1, rettype='gb', retmode='xml', seq_start=1, seq_stop=1)

也许您甚至不需要存储整个 GenBank 文件，而只需要以 ID 作为键、分类和细胞器作为值的字典？

python - 使用 biopython 仅下载部分 genbank 文件

1 回答 1

Related

Reference