我想知道如何使用 bash/sed/awk 将文件中具有重复标题的列组合起来。
x y x y
s1 3 4 6 10
s2 3 9 10 7
s3 7 1 3 2
到 :
x y
s1 9 14
s2 13 16
s3 10 3
$ cat file
x y x y
s1 3 4 6 10
s2 3 9 10 7
s3 7 1 3 2
$ cat tst.awk
NR==1 {
for (i=1;i<=NF;i++) {
flds[$i] = flds[$i] " " i+1
}
printf "%-3s",""
for (hdr in flds) {
printf "%3s",hdr
}
print ""
next
}
{
printf "%-3s",$1
for (hdr in flds) {
n = split(flds[hdr],fldNrs)
sum = 0
for (i=1; i<=n; i++) {
sum += $(fldNrs[i])
}
printf "%3d",sum
}
print ""
}
$ awk -f tst.awk file
x y
s1 9 14
s2 13 16
s3 10 3
$ time awk -f ./tst.awk file
x y
s1 9 14
s2 13 16
s3 10 3
real 0m0.265s
user 0m0.030s
sys 0m0.108s
如果您愿意,可以以明显的方式调整 printf 行以适应不同的输出格式。
这是响应 elsethread 评论的 bash 等价物。不要使用它,awk 解决方案是正确的,这只是为了说明如果您出于某种莫名其妙的原因想要这样做,应该如何在 bash 中编写它:
$ cat tst.sh
declare -A flds
while IFS= read -r rec
do
lineNr=$(( lineNr + 1 ))
set -- $rec
if (( lineNr == 1 ))
then
fldNr=1
for fld
do
fldNr=$(( fldNr + 1 ))
flds[$fld]+=" $fldNr"
done
printf "%-3s" ""
for hdr in "${!flds[@]}"
do
printf "%3s" "$hdr"
done
printf "\n"
else
printf "%-3s" "$1"
for hdr in "${!flds[@]}"
do
fldNrs=( ${flds[$hdr]} )
sum=0
for fldNr in "${fldNrs[@]}"
do
eval val="\$$fldNr"
sum=$(( sum + val ))
done
printf "%3d" "$sum"
done
printf "\n"
fi
done < "$1"
$
$ time ./tst.sh file
x y
s1 9 14
s2 13 16
s3 10 3
real 0m0.062s
user 0m0.031s
sys 0m0.046s
请注意,它的运行时间与 awk 脚本的数量级大致相同(请参阅 elsethread 注释)。警告 - 我从不编写 bash 脚本来处理文本文件,所以我并没有声称上面的 bash 脚本是完美的,只是一个如何在 bash 中处理它的示例,以便与我声称应该重写的该线程中的其他脚本进行比较!
这不是一条线。您可以使用 Bash v4、Bash 的字典和一些 shell 工具来完成。
使用文件名执行下面的脚本来处理一个参数
bash script_below.sh your_file
这是脚本:
declare -A coltofield
headerdone=0
# Take the first line of the input file and extract all fields
# and their position. Start with position value 2 because of the
# format of the following lines
while read line; do
colnum=$(echo $line | cut -d "=" -f 1)
field=$(echo $line | cut -d "=" -f 2)
coltofield[$colnum]=$field
done < <(head -n 1 $1 | sed -e 's/^[[:space:]]*//;' -e 's/[[:space:]]*$//;' -e 's/[[:space:]]\+/\n/g;' | nl -v 2 -n ln | sed -e 's/[[:space:]]\+/=/g;')
# Read the rest of the file starting with the second line
while read line; do
declare -A computation
declare varname
# Turn the line in key value pair. The key is the position of
# the value in the line
while read value; do
vcolnum=$(echo $value | cut -d "=" -f 1)
vvalue=$(echo $value | cut -d "=" -f 2)
# The first value is the line variable name
# (s1, s2)
if [[ $vcolnum == "1" ]]; then
varname=$vvalue
continue
fi
# Get the name of the field by the column
# position
field=${coltofield[$vcolnum]}
# Add the value to the current sum for this field
computation[$field]=$((computation[$field]+${vvalue}))
done < <(echo $line | sed -e 's/^[[:space:]]*//;' -e 's/[[:space:]]*$//;' -e 's/[[:space:]]\+/\n/g;' | nl -n ln | sed -e 's/[[:space:]]\+/=/g;')
if [[ $headerdone == "0" ]]; then
echo -e -n "\t"
for key in ${!computation[@]}; do echo -n -e "$key\t" ; done; echo
headerdone=1
fi
echo -n -e "$varname\t"
for value in ${computation[@]}; do echo -n -e "$value\t"; done; echo
computation=()
done < <(tail -n +2 $1)
另一个 AWK 替代方案:
$ cat f
x y x y
s1 3 4 6 10
s2 3 9 10 7
s3 7 1 3 2
$ cat f.awk
BEGIN {
OFS="\t";
}
NR==1 {
#need header for 1st column
for(f=NF; f>=1; --f)
$(f+1) = $f;
$1="";
for(f=1; f<=NF; ++f)
fld2hdr[f]=$f;
}
{
for(f=1; f<=NF; ++f)
if($f ~ /^[0-9]/)
colValues[fld2hdr[f]]+=$f;
else
colValues[fld2hdr[f]]=$f;
for (i in colValues)
row = row colValues[i] OFS;
print row;
split("", colValues);
row=""
}
$ awk -f f.awk f
x y
s1 9 14
s2 13 16
s3 10 3
$ awk 'BEGIN{print " x y"} a=$2+$4, b=$3+$5 {print $1, a, b}' file
x y
s1 9 14
s2 13 16
s3 10 3
毫无疑问,有更好的方法来显示标题,但我awk
的有点粗略。
这是一个 Perl 解决方案,只是为了好玩:
cat table.txt | perl -e'@h=grep{$_}split/\s+/,<>;while(@l=grep{$_}split/\s+/,<>){for$i(1..$#l){$t{$l[0]}{$h[$i-1]}+=$l[$i]}};printf " %s\n",(join" ",sort keys%{$t{(keys%t)[0]}});for$h(sort keys%t){printf"$h %s\n",(join " ",map{sprintf"%2d",$_}@{$t{$h}}{sort keys%{$t{$h}}})};'