3

好的,问题是,我有一个包含 N 行的列表,如下所示:

4.96035894  2.94014535  9.71651378 On
8.37470259  9.08139103 10.23145322 Off
5.73085411  4.21656546  9.98718707 On
6.40892867  9.44195654  8.83707549 On
4.26065784  3.74966832  7.89520829 On
8.89601431  9.84208918  9.63054539 On
9.10538764  8.58408119 10.87454882 On
6.21494725  4.61164407  9.08378204 Off
7.62256424  9.59449339 10.84506558 Off
6.49210768  4.03768151 10.75221925 Off
5.04079861  4.99362253 10.34349177 Off
...

目标是在第三个字段中找到具有最低值的 X (X < N) 行(它可以轻松扩展到任何给定字段,但让我们关注第三个字段)并更改第四个字段(始终是一个字符串)根据用户调用的参数切换为 On/Off,即,如果参数为 On,则更改为 On,如果为 Off,则更改为 Off。

在上面的示例中,例如,如果我想将第三个值最低的 3 行更改为关闭,则输出将是:

4.96035894  2.94014535  9.71651378 On
8.37470259  9.08139103 10.23145322 Off
5.73085411  4.21656546  9.98718707 On
6.40892867  9.44195654  8.83707549 Off // this row is changed
4.26065784  3.74966832  7.89520829 Off // this row is changed
8.89601431  9.84208918  9.63054539 On
9.10538764  8.58408119 10.87454882 On
6.21494725  4.61164407  9.08378204 Off // this row is changed
7.62256424  9.59449339 10.84506558 Off
6.49210768  4.03768151 10.75221925 Off
5.04079861  4.99362253 10.34349177 Off
...

我想我可以为 X=1 的特定情况做,最低值行,但我不知道如何扩展到任意 X。也许一个 X 大小的数组填充并在遍历列表时被编辑?

4

4 回答 4

3

有趣的问题,您需要巧妙地处理数组:

BEGIN {
    if (!x)                           # If x wasn't set using -v default is 3
        x=3
    if (!field)                       # If field wasn't set using -v default is 3
        field=3
}
{
    lines[NR]=$0                                    # Store each line in an array
    sort[NR]=$field                                 # Store the field in an array
    field_a[$field]=$0                              # Line lookup on field 
}
END{
    asort(sort)                                     # Sort the fields  

    for (j=1;j<=NR;j++) {                           # For every line in the file
        for(i=1;i<=x;i++) {                         # For the top x values
            if (lines[j] == field_a[sort[i]]) {     # If current line in top x
                sub(/On/,"Off",lines[j])            # Do the subsitution
                break                               # Grab the next line
            }
        }
        print lines[j]                              # print the line
    }
}

将其保存到文件中script.awk并运行如下:

$ awk -f script.awk file
4.96035894  2.94014535  9.71651378 On
8.37470259  9.08139103 10.23145322 Off
5.73085411  4.21656546  9.98718707 On
6.40892867  9.44195654  8.83707549 Off
4.26065784  3.74966832  7.89520829 Off
8.89601431  9.84208918  9.63054539 On
9.10538764  8.58408119 10.87454882 On
6.21494725  4.61164407  9.08378204 Off
7.62256424  9.59449339 10.84506558 Off
6.49210768  4.03768151 10.75221925 Off
5.04079861  4.99362253 10.34349177 Off

默认情况下,它会关闭字段 3 中的最低 3 个值,但您可以使用该-v选项指定字段和值的数量。例如,让我们关闭字段 3 中的最低 10 个值,只打开最大值:

$ awk -v x=10 -f script.awk file
4.96035894  2.94014535  9.71651378 Off
8.37470259  9.08139103 10.23145322 Off
5.73085411  4.21656546  9.98718707 Off
6.40892867  9.44195654  8.83707549 Off
4.26065784  3.74966832  7.89520829 Off
8.89601431  9.84208918  9.63054539 Off
9.10538764  8.58408119 10.87454882 On
6.21494725  4.61164407  9.08378204 Off
7.62256424  9.59449339 10.84506558 Off
6.49210768  4.03768151 10.75221925 Off
5.04079861  4.99362253 10.34349177 Off

字段 2 的最大值如何:

$ awk -v x=10 -v field=2 -f script.awk file
4.96035894  2.94014535  9.71651378 Off
8.37470259  9.08139103 10.23145322 Off
5.73085411  4.21656546  9.98718707 Off
6.40892867  9.44195654  8.83707549 Off
4.26065784  3.74966832  7.89520829 Off
8.89601431  9.84208918  9.63054539 On
9.10538764  8.58408119 10.87454882 Off
6.21494725  4.61164407  9.08378204 Off
7.62256424  9.59449339 10.84506558 Off
6.49210768  4.03768151 10.75221925 Off
5.04079861  4.99362253 10.34349177 Off

注意:该asort()功能的使用需要GNU awk.

于 2013-05-04T12:56:38.107 回答
2

像这样的东西会起作用:

x=3
f=3
awk -v f="$f" '{print $f, NR, $0}' file |
sort -n |
awk -v x="$x" 'NR<=x{sub(/On/,"Off")} {print}' |
sort -k2n |
awk '{sub(/[^ ]+ +[^ ]+ +/,""); print}'

f 是要排序的字段,x 是要标记的最小值。

您可以使用插入排序或 gawks 内置排序函数 asort()/asorti() 在 awk 中完成所有操作,但上述内容很简单,我很懒...

$ x=3; f=3; awk -v f="$f" '{print $f, NR, $0}' file | sort -n | awk -v x="$x" 'NR<=x{sub(/On/,"Off")} {print}' | sort -k2n | awk '{sub(/[^ ]+ +[^ ]+ +/,""); print}'
4.96035894  2.94014535  9.71651378 On
8.37470259  9.08139103 10.23145322 Off
5.73085411  4.21656546  9.98718707 On
6.40892867  9.44195654  8.83707549 Off
4.26065784  3.74966832  7.89520829 Off
8.89601431  9.84208918  9.63054539 On
9.10538764  8.58408119 10.87454882 On
6.21494725  4.61164407  9.08378204 Off
7.62256424  9.59449339 10.84506558 Off
6.49210768  4.03768151 10.75221925 Off
5.04079861  4.99362253 10.34349177 Off

$ x=4; f=2; awk -v f="$f" '{print $f, NR, $0}' file | sort -n | awk -v x="$x" 'NR<=x{sub(/On/,"Off")} {print}' | sort -k2n | awk '{sub(/[^ ]+ +[^ ]+ +/,""); print}'
4.96035894  2.94014535  9.71651378 Off
8.37470259  9.08139103 10.23145322 Off
5.73085411  4.21656546  9.98718707 Off
6.40892867  9.44195654  8.83707549 On
4.26065784  3.74966832  7.89520829 Off
8.89601431  9.84208918  9.63054539 On
9.10538764  8.58408119 10.87454882 On
6.21494725  4.61164407  9.08378204 Off
7.62256424  9.59449339 10.84506558 Off
6.49210768  4.03768151 10.75221925 Off
5.04079861  4.99362253 10.34349177 Off
于 2013-05-04T12:54:25.413 回答
1

和另一种方法:

n=4
field=3
newval=FOO
# find the line numbers that need to be updated
set -- $(
    cat -n file |
    sort -nk $((++field)),$field |
    awk -v n=$n 'FNR <= n {print $1}'
)
# now, update the value for the specific lines
awk -v val="$newval" -v lines=" $* " 'lines ~ " "FNR" " {$NF = val} 1' file
于 2013-05-04T14:21:33.570 回答
1

还有另一种方法,读取文件两次,边走边订购..

awk '
  NR==FNR{
    S[0]=$field
    # sort the value into place
    for(i=1;i<=n;i++){
      if(S[i-1]>S[i]){
        c=S[i-1]
        S[i-1]=S[i]
        S[i]=c
      }
    }
    # shift the highest value into oblivion
    if(NR>n) for(i=n; i>=1; i--) S[i]=S[i-1]
    next
  }
  # Create associative array entries for the values 
  FNR==1 {
    for(i=1;i<=n;i++){
      A[S[i]]
    }
  }
  # if $field is one of the values then change the last field (assuming there are no other fields with value of $NF)
  $field in A {
    sub($NF,"Off")
  }
  1
' n=3 field=3 file file
于 2013-05-04T16:35:53.143 回答