1

I want to grep several words in file1, and use each word to grep what follows after its match in file2.fasta. And then I want to add the thing that followed the match to the word I used into file03, so that file03 contains information from both files. Part of files I have are:

file1:

Jan12345: ID1 ID2 ... IDN1
Jan67899: ID11 ID12 ... IDN2

And a Fasta file (file2) like this:

>ID1
ABCDEFG
>ID2
HIJKLMN
>IDN1
OPQRSTU
>ID11
WXYZABC
>ID12
DEFGHIJ
>IDN2
KLMNOPQ

The output I want is for this example:

Jan12345 ID1 ABCDEFG ID2 HIJKLMN ... IDN1 OPQRSTU
Jan67899: ID11 WXYZABC ID12 DEFGHIJ... IDN2 KLMNOPQ

As you can see, I simply want to add the FASTA sequence - which is contained in file2 – to file1. If anyone knows how to do this I would greatly appreciate it!

4

3 回答 3

2

一种方式awk

awk '
NR==FNR && /\>/ {
    x=$0
    getline b
    a[substr(x,2)]=b
    next
} 
{
    for (i=2;i<=NF;i++) {
        for (k in a) {
            if ($i==k) {
                $i=$i" "a[k]
            }
        }
    }
}1' file2 file1

单线:

awk 'NR==FNR{NF==2?k=$2:a[k]=$1;next}{for(i=2;i<=NF;i++){for(k in a){$i=$i==k?$i OFS a[k]:$i}}}1' FS="[> ]" file{2,1}

使用您的样本数据输出:

$ awk 'NR==FNR {NF==2?k=$2:a[k]=$1;next}{for(i=2;i<=NF;i++){for(k in a){$i=$i==k?$i OFS a[k]:$i}}}1' FS="[> ]" file{2,1}
Jan12345: ID1 ABCDEFG ID2 HIJKLMN IDN1 OPQRSTU
Jan67899: ID11 WXYZABC ID12 DEFGHIJ IDN2 KLMNOPQ
于 2013-06-13T16:12:11.100 回答
2

将 fasta/file2 文件读%h入哈希,并替换 file1 中的每一行,

perl -pe 'BEGIN{open F,pop;%h=map{y|\r\n>||d;$_}<F>} s|(ID\S+)|$1 $h{$1}|g' file1 file2
于 2013-06-13T17:13:51.243 回答
1

GNU sed 的丑陋方式:

  • 第一步:制作命令脚本

    sed -r 's#^(\S+)\s+#${x;s/^\\s\\\|>//g;p};1{s/.*/\1/;h};/\n#;h;s/\n.*//;x;s/.*\n//;:ka;s#(\S+)\s*#\\b\1\\b\\| #;H;g;s/\n(\S+).*/\1/;x;s/.*\n\S+\s*//;tka;s/\\\|\n/\/!d;$!N;H;x;s\/\\n\/ \/g;x/' file1 > file.sed
    
  • 第二步:用 bash 制作结果文件

    #!/bin/bash
    while read p; do 
    sed -n $p file2
    done < file.sed > file3
    
于 2013-06-15T22:18:56.393 回答