0

我正在尝试使用 XML 格式解析 BLAST 输出re,以前从未这样做过,下面是我的代码。

但是,由于某些点击Hsp_num有时不止一次,我得到更多的结果query_fromandquery_to和更少的结果query_len,如何指定 if Hsp_numis more than 1 do print query_lenfor it again?谢谢你

import re
output = open('result.txt','w')
n = 0
with open('file.xml','r') as xml:
    for line in xml:
         if re.search('<Hsp_query-from>', line) != None:
             line = line.strip()
             line = line.rstrip()
             line = line.strip('<Hsp_query-from>')
             line = line.rstrip('</')
             query_from = line
         if re.search('<Hsp_query-to>', line) != None:
             line = line.strip()
             line = line.rstrip()
             line = line.strip('<Hsp_query-to>')
             line = line.rstrip('</')
             query_to = line
         if re.search('<Hsp_num>', line) != None:
             line = line.strip()
             line = line.rstrip()
             line = line.strip('<Hsp_num>')
             line = line.rstrip('</')
             Hsp_num = line
             print >> output, Hsp_num+'\t'+query_from+'\t'+query_to
output.close()

query_len在一个单独的文件中做了,因为它不起作用..

with open('file.xml','r') as xml:
    for line in xml:
        if re.search('<Iteration_query-len>', line) != None:
            line = line.strip()
            line = line.rstrip()
            line = line.strip('<Iteration_query-len>')
            line = line.rstrip('</')
            query_len = line  
4

1 回答 1

2

你熟悉Biopython吗?它的Bio.Blast.NCBIXML模块可能正是您所需要的。Tutorial and Cookbook 的第 7 章都是关于 BLAST 的,第 7.3 节处理解析。您将了解它是如何工作的,这将比使用正则表达式解析 XML 容易得多,这只会导致眼泪和精神崩溃

于 2014-01-23T17:13:15.667 回答