1

我从 biopython 开始,我有一个关于解析结果的问题。我使用了一个教程来参与其中,这是我使用的代码:

from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("/Users/jcastrof/blast/pruebarpsb.xml")):
    if record.alignments:
        print "Query: %s..." % record.query[:60]
        for align in record.alignments:
            for hsp in align.hsps:
                print " %s HSP,e=%f, from position %i to %i" \
                      % (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end)

得到的部分结果是:

 gnl|CDD|225858 HSP,e=0.000000, from position 32 to 1118
 gnl|CDD|225858 HSP,e=0.000000, from position 1775 to 2671
 gnl|CDD|214836 HSP,e=0.000000, from position 37 to 458
 gnl|CDD|214836 HSP,e=0.000000, from position 1775 to 2192
 gnl|CDD|214838 HSP,e=0.000000, from position 567 to 850

我想要做的是按命中位置(Hsp_hit-from)对结果进行排序,如下所示:

 gnl|CDD|225858 HSP,e=0.000000, from position 32 to 1118
 gnl|CDD|214836 HSP,e=0.000000, from position 37 to 458
 gnl|CDD|214838 HSP,e=0.000000, from position 567 to 850
 gnl|CDD|225858 HSP,e=0.000000, from position 1775 to 2671
 gnl|CDD|214836 HSP,e=0.000000, from position 1775 to 2192

我的 rps-blast 输入文件是 *.xml 文件。有什么建议继续吗?

谢谢!

4

1 回答 1

2

HSPs 列表只是一个 Python 列表,可以像往常一样进行排序。尝试:

align.hsps.sort(key = lambda hsp: hsp.query_start)

但是,您正在处理一个嵌套列表(每个匹配项都有一个 HSP 列表),并且您希望对所有这些列表进行排序。在这里制作自己的清单可能是最好的——像这样:

for record in ...:
    print "Query: %s..." % record.query[:60]
    hits = sorted((hsp.query_start, hsp.query_end, hsp.expect, align.hit_id) \
                   for hsp in align.hsps for align in record.alignments)
    for q_start, q_end, expect, hit_id in hits:
        print " %s HSP,e=%f, from position %i to %i" \
              % (hit_id, expect, q_start, q_end)

彼得

于 2013-04-17T21:17:40.537 回答