1

我正在尝试使用 awk 命令来合并这两个数据集。我想获取 fileA.$1 和 fileB.$1 相同的所有行,并且 fileA.$4 和 fileA.$5 的平均值在 fileB.$2 和 fileB.$3 之间。(fileA.$1=fileB.$1 AND fileB.$2 < 平均值(fileA.$4+fileA.$5) < fileB.$3)。任何人都可以为此做一条线吗?

fileA                           
chr1    Mot TF  500 700 0.9893  target1 600
chr1    Mot TF  100 300 0.9893  target1 200
chr1    Mot TF  1000    2000    0.9893  target1 1500
chr2    Mot TF  500 700 0.9502  target2 600

fileB       
chr1    500 1000
chr1    400 800
chr1    100 800
chr3    100 500

desired result                              
chr1    500 1000    chr1    Mot TF  500 700 0.9893  target1 600
chr1    400 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    100 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    100 800 chr1    Mot TF  100 300 0.9893  target1 200
4

2 回答 2

1
#!/usr/bin/awk -f

BEGIN {
    FS = OFS = "\t"
}
NR == FNR {
    a0[NR] = $0
    a1[NR] = $1
    av[NR] = ($4 + $5) / 2
    next
}
{
    for (i = 1; i in a0; ++i) {
        if (a1[i] == $1 && av[i] > $2 && av[i] < $3) {
            print $0, a0[i]
        }
    }
}

运行:

awk -f script.awk fileA fileB

输出:

chr1    500 1000    chr1    Mot TF  500 700 0.9893  target1 600
chr1    400 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    100 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    100 800 chr1    Mot TF  100 300 0.9893  target1 200
于 2013-09-20T19:53:53.040 回答
0

如果您对输出格式很灵活:

join fileB fileA | awk '$2 < $NF && $NF < $3' 
chr1 500 1000 Mot TF 500 700 0.9893 target1 600
chr1 400 800 Mot TF 500 700 0.9893 target1 600
chr1 100 800 Mot TF 500 700 0.9893 target1 600
chr1 100 800 Mot TF 100 300 0.9893 target1 200

join不打印连接列两次。我假设 fileA 的最后一个字段已经是平均值。

除此以外

awk -v OFS='\t' '
    NR==FNR {f1[$0] = $1; min[$0] = $2; max[$0] = $3; next}
    {
        avg=($4+$5)/2
        for (b in f1) {
            if ($1 == f1[b] && min[b] < avg && avg < max[b]) {
                print b, $0
            }
        }
    }
' fileB fileA
chr1    100 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    500 1000    chr1    Mot TF  500 700 0.9893  target1 600
chr1    400 800 chr1    Mot TF  500 700 0.9893  target1 600
chr1    100 800 chr1    Mot TF  100 300 0.9893  target1 200
于 2013-09-20T21:30:47.820 回答