0

I am trying to split a file into different smaller files depending on the value of the fifth field. A very nice way to do this was already suggested and also here.

However, I am trying to incorporate this into a .sh script for qsub, without much success.

The problem is that in the section where the file to which output the line is specified,

i.e., f = "Alignments_" $5 ".sam" print > f

, I need to pass a variable declared earlier in the script, which specifies the directory where the file should be written. I need to do this with a variable which is built for each task when I send out the array job for multiple files.

So say $output_path = ./Sample1

I need to write something like

f = $output_path "/Alignments_" $5 ".sam"        print > f

But it does not seem to like having a $variable that is not a $field belonging to awk. I don't even think it likes having two "strings" before and after the $5.

The error I get back is that it takes the first line of the file to be split (little.sam) and tries to name f like that, followed by /Alignments_" $5 ".sam" (those last three put together correctly). It says, naturally, that it is too big a name.

How can I write this so it works?

Thanks!

awk -F '[:\t]' '    # read the list of numbers in Tile_Number_List
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the .BAM file
    # any lines with an "unknown" $5 will be ignored
$5 in num {
    f = "Alignments_" $5 ".sam"        print > f
} ' Tile_Number_List.txt little.sam

UPDATE, AFTER ADDING -V TO AWK AND DECLARING THE VARIABLE OPATH

input=$1
outputBase=${input%.bam}

mkdir -v $outputBase\_TEST

newdir=$outputBase\_TEST

samtools view -h $input | awk 'NR >= 18' | awk -F '[\t:]' -v opath="$newdir" '

FNR == NR {
    num[$1]
    next
}

$5 in num {
    f = newdir"/Alignments_"$5".sam";
    print > f
} ' Tile_Number_List.txt -

mkdir: created directory little_TEST'
awk: cmd. line:10: (FILENAME=- FNR=1) fatal: can't redirect to `/Alignments_1101.sam' (Permission denied)
4

2 回答 2

1

要传递 shell 变量的值,例如$output_pathawk您需要使用该-v选项。

$ output_path=./Sample1/

$ awk -F '[:\t]' -v opath="$ouput_path" '    
    # read the list of numbers in Tile_Number_List
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the .BAM file
    # any lines with an "unknown" $5 will be ignored
    $5 in num {
        f = opath"Alignments_"$5".sam"
        print > f
    } ' Tile_Number_List.txt little.sam

此外,您的脚本中仍然存在上一个问题的错误

编辑:

使用is创建的awk变量,但您使用的是您想要的:-vobasenewdir

input=$1
outputBase=${input%.bam}
mkdir -v $outputBase\_TEST
newdir=$outputBase\_TEST

samtools view -h "$input" | awk -F '[\t:]' -v opath="$newdir" '
FNR == NR && NR >= 18 {
    num[$1]
    next
}    
$5 in num {
    f = opath"/Alignments_"$5".sam"   # <-- opath is the awk variable not newdir
    print > f
}' Tile_Number_List.txt -

您还应该NR >= 18进入第二个awk脚本。

于 2013-05-06T21:44:38.347 回答
1

awk 变量就像 C 变量 - 只需按名称引用它们即可获取它们的值,无需像使用 shell 变量那样在它们前面粘贴“$”:

awk -F '[:\t]' '    # read the list of numbers in Tile_Number_List
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the .BAM file
    # any lines with an "unknown" $5 will be ignored
$5 in num {
    output_path = "./Sample1/"
    f = output_path "Alignments_" $5 ".sam"
    print > f
} ' Tile_Number_List.txt little.sam
于 2013-05-06T21:46:21.500 回答