python - 用单独文件中的完整注释替换爆炸报告中的 accession_id

Question

我有一个带有以下格式注释的文件：

  XS-5236245.2_hypothetical_protein

和一个制表符分隔的爆炸报告，第二列中只有加入 ID：

  transcript1  XS-5236245.2  94.3  35  0  245  356  789  896  1e-230 6.3

当匹配时，我想用注释文件中的整行替换爆炸报告中的 accession_id。这是我的尝试，如您所见，我使用非常基本的 python。如果您给我一个更复杂的解决方案，我将不胜感激。谢谢您的帮助。

林努

#!/usr/bin/python
#import sys

#input1 = sys.argv[1] --> file with annoations
#input2 = sys.argv[2] --> file with blast report
#output = sys.argv[3] --> modified blast report with annotations

f1 = open(sys.argv[1],"r")
f2 = open(sys.argv[2],"r")
f3 = open(sys.argv[3],"w")

#open and read line by line:
for line in f1:
        # break line by '_'
        splitline = line.split("_")
        # define search_id as the first element of the line
        searchid = splitline[0]
        # open blast report and read line by line
        for row in f2:
                # split columns by tab separator
                col = row.split("\t")
                # define target_id as the content of the second column
                targetid = col[1]
                # when target_id matches search_id replace content with the whole line
                if searchid == targetid:
                        f3.write(targetid.replace(searchid, splitline))
                else:
                        pass

f1.close()
f2.close()
f3.close()

score 0 · Accepted Answer

我找到了这样的解决方案：

创建一个包含两列的新文件 (accessionid_headers.txt)，第一列具有登录 ID，第二列具有完整标题。使用 python 很容易：

#!usr/bin/env python
import sys

f1 = open(sys.argv[1],'r')
f2 = open(sys.argv[2],'w')

for line in f1:
    splitline = line.split('_')
    accessionid = splitline[0]
    f2.write('{0} {1}'.format(accessionid, line))

f1.close()
f2.close()

爆炸报告格式如下：

c16_g1_i1 len=581 path=[12725:0-580]    XS-5236245.2    94.9    59  3   0   403 579 254 312 8.6e-27 116.3

我应用了我修改的以下 awk 以适应我的文件，但是 appologies 因为我不是 awk 专家来完全解释每个步骤。也许有人可以贡献解释每个步骤在做什么：
```
awk 'NR==FNR{a[$1]=$2;next}$4 in a{$4=a[$4]}1' accession_headers.txt blast.report > outfile 
```

python - 用单独文件中的完整注释替换爆炸报告中的 accession_id

1 回答 1

Related

Reference