我有一个带有以下格式注释的文件:
XS-5236245.2_hypothetical_protein
和一个制表符分隔的爆炸报告,第二列中只有加入 ID:
transcript1 XS-5236245.2 94.3 35 0 245 356 789 896 1e-230 6.3
当匹配时,我想用注释文件中的整行替换爆炸报告中的 accession_id。这是我的尝试,如您所见,我使用非常基本的 python。如果您给我一个更复杂的解决方案,我将不胜感激。谢谢您的帮助。
林努
#!/usr/bin/python
#import sys
#input1 = sys.argv[1] --> file with annoations
#input2 = sys.argv[2] --> file with blast report
#output = sys.argv[3] --> modified blast report with annotations
f1 = open(sys.argv[1],"r")
f2 = open(sys.argv[2],"r")
f3 = open(sys.argv[3],"w")
#open and read line by line:
for line in f1:
# break line by '_'
splitline = line.split("_")
# define search_id as the first element of the line
searchid = splitline[0]
# open blast report and read line by line
for row in f2:
# split columns by tab separator
col = row.split("\t")
# define target_id as the content of the second column
targetid = col[1]
# when target_id matches search_id replace content with the whole line
if searchid == targetid:
f3.write(targetid.replace(searchid, splitline))
else:
pass
f1.close()
f2.close()
f3.close()