1

我正在尝试使用 biopython 向 genbank 文件添加超过 70000 个新功能。

我有这个代码:

from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation

fi = "myoriginal.gbk"
fo = "mynewfile.gbk"

for result in results:
     start = 0
     end = 0

     result = result.split("\t")
     start = int(result[0])
     end = int(result[1])

     for record in SeqIO.parse(original, "gb"):
         record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
         SeqIO.write(record, fo, "gb")

结果只是一个列表,其中包含我需要添加到原始 gbk 文件中的每个功能的开始和结束。

这个解决方案对我的电脑来说非常昂贵,我不知道如何提高性能。有什么好主意吗?

4

1 回答 1

1

您应该只解析一次 genbank 文件。省略results包含的内容(我不知道确切,因为您的示例中缺少一些代码),我猜这样的事情会提高性能,修改您的代码:

fi = "myoriginal.gbk"
fo = "mynewfile.gbk"

original_records = list(SeqIO.parse(fi, "gb"))

for result in results:
    result = result.split("\t")
    start = int(result[0])
    end = int(result[1])

    for record in original_records:
        record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
        SeqIO.write(record, fo, "gb")
于 2015-07-22T11:59:57.863 回答