bash - 如何使用 Awk 或 Bash 在 1 个文件中组合具有相同标题的列

Question

我想知道如何使用 bash/sed/awk 将文件中具有重复标题的列组合起来。

   x y  x  y
s1 3 4  6 10
s2 3 9 10  7
s3 7 1  3  2

到：

score 5 · Accepted Answer

$ cat file
   x y  x  y
s1 3 4  6 10
s2 3 9 10  7
s3 7 1  3  2

$ cat tst.awk
NR==1 {
   for (i=1;i<=NF;i++) {
      flds[$i] = flds[$i] " " i+1
   }
   printf "%-3s",""
   for (hdr in flds) {
      printf "%3s",hdr
   }
   print ""
   next
}
{
   printf "%-3s",$1
   for (hdr in flds) {
      n = split(flds[hdr],fldNrs)
      sum = 0
      for (i=1; i<=n; i++) {
         sum += $(fldNrs[i])
      }
      printf "%3d",sum
   }
   print ""
}

$ awk -f tst.awk file
     x  y
s1   9 14
s2  13 16
s3  10  3

$ time awk -f ./tst.awk file
     x  y
s1   9 14
s2  13 16
s3  10  3

real    0m0.265s
user    0m0.030s
sys     0m0.108s

如果您愿意，可以以明显的方式调整 printf 行以适应不同的输出格式。

这是响应 elsethread 评论的 bash 等价物。不要使用它，awk 解决方案是正确的，这只是为了说明如果您出于某种莫名其妙的原因想要这样做，应该如何在 bash 中编写它：

$ cat tst.sh
declare -A flds
while IFS= read -r rec
do
   lineNr=$(( lineNr + 1 ))
   set -- $rec

   if (( lineNr == 1 ))
   then

      fldNr=1
      for fld
      do
         fldNr=$(( fldNr + 1 ))
         flds[$fld]+=" $fldNr"
      done
      printf "%-3s" ""
      for hdr in "${!flds[@]}"
      do
         printf "%3s" "$hdr"
      done
      printf "\n"

   else

      printf "%-3s" "$1"
      for hdr in "${!flds[@]}"
      do
         fldNrs=( ${flds[$hdr]} )
         sum=0
         for fldNr in "${fldNrs[@]}"
         do
            eval val="\$$fldNr"
            sum=$(( sum + val ))
         done
         printf "%3d" "$sum"
      done
      printf "\n"

   fi

done < "$1"
$
$ time ./tst.sh file
     x  y
s1   9 14
s2  13 16
s3  10  3

real    0m0.062s
user    0m0.031s
sys     0m0.046s

请注意，它的运行时间与 awk 脚本的数量级大致相同（请参阅 elsethread 注释）。警告 - 我从不编写 bash 脚本来处理文本文件，所以我并没有声称上面的 bash 脚本是完美的，只是一个如何在 bash 中处理它的示例，以便与我声称应该重写的该线程中的其他脚本进行比较！

score 1 · Accepted Answer

这不是一条线。您可以使用 Bash v4、Bash 的字典和一些 shell 工具来完成。

使用文件名执行下面的脚本来处理一个参数

bash script_below.sh your_file

这是脚本：

declare -A coltofield
headerdone=0

# Take the first line of the input file and extract all fields 
# and their position. Start with position value 2 because of the 
# format of the following lines

while read line; do
    colnum=$(echo $line | cut -d "=" -f 1)
    field=$(echo $line | cut -d "=" -f 2)

    coltofield[$colnum]=$field
done < <(head -n 1 $1 | sed  -e 's/^[[:space:]]*//;' -e 's/[[:space:]]*$//;' -e 's/[[:space:]]\+/\n/g;' | nl -v 2 -n ln  | sed -e 's/[[:space:]]\+/=/g;')

# Read the rest of the file starting with the second line             
while read line; do
    declare -A computation
    declare varname


    # Turn the line in key value pair. The key is the position of 
    # the value in the line
    while read value; do
        vcolnum=$(echo $value | cut -d "=" -f 1)
        vvalue=$(echo $value | cut -d "=" -f 2)

        # The first value is the line variable name 
        # (s1, s2)                                       
        if [[ $vcolnum == "1" ]]; then
            varname=$vvalue
            continue
        fi

        # Get the name of the field by the column 
        # position                                                     
        field=${coltofield[$vcolnum]}

        # Add the value to the current sum for this field
        computation[$field]=$((computation[$field]+${vvalue}))
    done < <(echo $line | sed  -e 's/^[[:space:]]*//;' -e 's/[[:space:]]*$//;' -e 's/[[:space:]]\+/\n/g;' | nl -n ln  | sed -e 's/[[:space:]]\+/=/g;')


    if [[ $headerdone == "0" ]]; then
        echo -e -n "\t"
        for key in ${!computation[@]}; do echo -n -e "$key\t" ; done; echo
        headerdone=1
    fi

    echo -n -e "$varname\t"
    for value in ${computation[@]}; do echo -n -e "$value\t"; done; echo

    computation=()

done < <(tail -n +2 $1)

score 1 · Accepted Answer

另一个 AWK 替代方案：

$ cat f
   x y  x  y
s1 3 4  6 10
s2 3 9 10  7
s3 7 1  3  2

$ cat f.awk
BEGIN {
OFS="\t";
}

NR==1 {
  #need header for 1st column
  for(f=NF; f>=1; --f)
    $(f+1) = $f;
  $1="";

  for(f=1; f<=NF; ++f)
    fld2hdr[f]=$f;
}

{
  for(f=1; f<=NF; ++f)
    if($f ~ /^[0-9]/)
      colValues[fld2hdr[f]]+=$f;
    else
      colValues[fld2hdr[f]]=$f;

  for (i in colValues)
    row = row colValues[i] OFS;
  print row;

  split("", colValues);
  row=""
}

$ awk -f f.awk f
        x       y
s1      9       14
s2      13      16
s3      10      3

score 0 · Accepted Answer

$ awk 'BEGIN{print "   x y"} a=$2+$4, b=$3+$5 {print $1, a, b}' file
   x y
s1 9 14
s2 13 16
s3 10 3

毫无疑问，有更好的方法来显示标题，但我awk的有点粗略。

score 0 · Accepted Answer

这是一个 Perl 解决方案，只是为了好玩：

cat table.txt | perl -e'@h=grep{$_}split/\s+/,<>;while(@l=grep{$_}split/\s+/,<>){for$i(1..$#l){$t{$l[0]}{$h[$i-1]}+=$l[$i]}};printf "    %s\n",(join"  ",sort keys%{$t{(keys%t)[0]}});for$h(sort keys%t){printf"$h %s\n",(join " ",map{sprintf"%2d",$_}@{$t{$h}}{sort keys%{$t{$h}}})};'

bash - 如何使用 Awk 或 Bash 在 1 个文件中组合具有相同标题的列

5 回答 5

Related

Reference