1

I am trying to extract a subset of my data which is tab delimited. I would like to use some information in a column. For example column2 has three scores seperated by ";"

col1 col2
1    a=2;b=1.1;c=0    
1    a=0.2;b=0.2;c=0.5  
1    a=1.5;b=1.9;c=3.5  

I would like to extract the rows whose b value is grater than 1. In this case my desired output will be

col1 col2
1    a=2;b=1.1;c=0    
1    a=1.5;b=1.9;c=3.5  

I tried to use awk but extracting information within the column did not work. Also, the order is not always the same (a,b,c etc.)It would be best to include 'b > 1' in the search criteria. Any suggestions?

4

2 回答 2

4

由于 Column2 的顺序可以是随机的,因此您可以执行以下操作:

awk -F'\t' '
NR>1 {
    split($2,ary,/[;=]/); 
        for (i=1;i<=length(ary);i++) { 
            if (ary[i]=="b" && ary[i+1]>1) {
                print $0 
            }
        }
    next
}1' file

测试:

$ cat f
col1    col2
1       a=2;b=1.1;c=0    
1       a=0.2;b=0.2;c=0.5  
1       a=1.5;b=1.9;c=3.5  

$ awk -F'\t' '
NR>1 {
    split($2,ary,/[;=]/); 
        for (i=1;i<=length(ary);i++) { 
            if (ary[i]=="b" && ary[i+1]>1) {
                print $0 
            }
        }
    next
}1' f
col1    col2
1       a=2;b=1.1;c=0    
1       a=1.5;b=1.9;c=3.5  
于 2013-06-10T15:27:48.227 回答
2

GNU sed

sed -r '/b=0.[0-9]?|b=1.0|b=1([^0-9.]|$)/d' file
于 2013-06-10T16:22:07.130 回答