python - 读取多个爆炸文件（biopython）

Question

我正在尝试读取通过向 NCBI 爆炸网站提交的多个序列生成的 XML 文件列表。从每个文件中，我想打印某些信息行。我要阅读的文件都以"_recombination.xml".

for file in glob.glob("*_recombination.xml"):
    result_handle= open(file)
    blast_record=NCBIXML.read(result_handle)
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            print "*****Alignment****"
            print "sequence:", alignment.title
            print "length:", alignment.length
            print "e-value:", hsp.expect
            print hsp.query
            print hsp.match
            print hsp.sbjct

该脚本首先找到所有带有"_recombination.xml"后缀的文件，然后我希望它读取每个文件，并打印某些行（这几乎是 BioPython 食谱的直接副本），这似乎是这样做的。但我收到以下错误：

Traceback (most recent call last):
File "Scripts/blast_test.py", line 202, in <module>
blast_record=NCBIXML.read(result_handle)
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 576, in read
first = iterator.next()
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 643, in parse
expat_parser.Parse("", True) # End of XML record
xml.parsers.expat.ExpatError: no element found: line 3106, column 7594

我不确定问题是什么。我不确定它是否试图循环回它已经读取的文件 - 例如，关闭文件似乎有帮助：

for file in glob.glob("*_recombination.xml"):
    result_handle= open(file)
    blast_record=NCBIXML.read(result_handle)
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            print "*****Alignment****"
            print "sequence:", alignment.title
            print "length:", alignment.length
            print "e-value:", hsp.expect
            print hsp.query
            print hsp.match
            print hsp.sbjct
    result_handle.close()
    blast_record.close()

但这也给了我另一个错误：

Traceback (most recent call last): 
File "Scripts/blast_test.py", line 213, in <module> blast_record.close() 
AttributeError: 'Blast' object has no attribute 'close'

score 3 · Accepted Answer

我通常使用parse方法而不是read，也许它可以帮助你：

for blast_record in NCBIXML.parse(open(input_xml)):
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            print "*****Alignment****"
            print "sequence:", alignment.title
            print "length:", alignment.length
            print "e-value:", hsp.expect
            print hsp.query
            print hsp.match
            print hsp.sbjct

并确保在您的查询爆炸中使用-outfmt 5生成您的 xml

score 0 · Accepted Answer

我会在 Biogeek 答案中添加评论，但我不能（还没有足够的声誉）。事实上他是对的，你应该使用

NCBIXML.parse(open(input_xml))

而不是 NCBIXML.read(open(input_xml)) 因为您正在“尝试读取 XML 文件列表”，而对于您需要解析而不是读取的 XML 文件列表。它解决了你的问题吗？

python - 读取多个爆炸文件（biopython）

2 回答 2

Related

Reference