我正在使用 biopython 做与此类似的事情, 按命中位置对 rps-blast 结果进行排序,但希望加入或连接本地命中以具有连续的查询和主题命中。
我的代码:
for record in records:
for alignment in record.alignments:
hits = sorted((hsp.query_start, hsp.query_end, hsp.sbjct_start, hsp.sbjct_end, alignment.title, hsp.query, hsp.sbjct)\
for hsp in alignment.hsps)
for q_start, q_end, sb_start, sb_end, title, query, sbjct in hits:
print title
print 'The query starts from position: ' + str(q_start)
print 'The query ends at position: ' + str(q_end)
print 'The hit starts at position: ' + str(sb_start)
print 'The hit ends at position: ' + str(sb_end)
print 'The query is: ' + query
print 'The hit is: ' + sbjct
这将给出排序结果:
Species_1
The query starts from position: 1
The query ends at position: 184
The hit starts at position: 1
The hit ends at position: 552
The query is: #######query_seq
The hit is: ######### hit_seq
Species_1
The query starts from position: 390
The query ends at position: 510
The hit starts at position: 549
The hit ends at position: 911
The query is: #######query_seq
The hit is: ######### hit_seq
Species_1
The query starts from position: 492
The query ends at position: 787
The hit starts at position: 889
The hit ends at position: 1776
The query is: #######query_seq
The hit is: ######### hit_seq
这一切都很好,但我想进行下一个合乎逻辑的步骤,即连接此处显示的所有三个 sub_queries 和 sub-hits(命中数确实不同)以获得完整的查询和主题序列。前进的方向是什么?