unix - 在 awk 中操作列

Question

我想操作一些列来更好地将一些数据表示为图表。到目前为止，我已经将所有数据都排成这样，第一行是标题，接下来的两行是 x，然后是那个点的 y 数据。文件中的所有数字都由一个选项卡分隔。像这样：

200.023 468.865 567.976 647.711    ##this line is the 'header'
59.927  58.099  60.546  61.461     ##this line is x data
1576.77 2192.02 1630.22 1879.04    ##this line is y data
59.769  59.300  60.380  61.308     ##next x
1111.36 2674.2  1590.89 970.134    ##next y
##numbers continue down, and across the page

我想获取 y 坐标（第 1 列第 3 行）并将其放置在与由制表符分隔的 x（第 2 列第 2 行）相同的行中，因此该行现在将是两倍长。为了使标题行不会错位，我认为我应该添加一些易于识别的内容，例如“空格”/ t，但任何事情都可以。输出：

200.023 space   468.865 space   567.976 space   647.711 space
59.927  1576.77 58.099  2192.02 60.546  1630.22 61.461  1879.04
59.769  1111.36 59.300  2674.2  60.380  1590.89 61.308  970.134

让我知道我是否应该澄清任何事情。谢谢

score 3 · Accepted Answer

这需要一个脚本：

$ cat preprocess.awk 
BEGIN {                       # Before processing the file set OFS
    OFS="\t"                  # Set the Output Field Separator to a TAB
}
NR == 1 {                     # Process the header
    for (i=1;i<=NF;i++)       # Loop over every field in the header
        $i=$i OFS             # Add a second TAB to space headers correctly
    print $0                  # Print the header
    next                      # Get the next line in the file
}
NR % 2 {                      # For all the odd lines in the file (y lines)
    line = sep = ""           # Clear the line and separator variables
    for (i=1;i<=NF;i++) {     # Loop over the y value
        line = line sep x[i] OFS $i  # Concatenate x and y values
        sep = OFS             # Set after to avoid leading TAB
    }
    print line                # Print the line
    next                      # Get the next line in the file
}
{                             # If here then we are looking at the even lines (x)
    for (i=1;i<=NF;i++)       # Loop over the x values
        x[i] = $i             # Store values in array x
}

这将用两个选项卡分隔标题，并在连续行中收集所有x和y对。

用法：

使用以下输入：

$ cat file
h1 h2 h3 h4
x1 x2 x3 x4
y1 y2 y3 y4
x5 x6 x7 x8
y5 y6 y7 y8
x9 x10 x11 x12
y9 y10 y11 y12

产生：

$ awk -f preprocess.awk file 
h1      h2      h3      h4  
x1  y1  x2  y2  x3  y3  x4  y4
x5  y5  x6  y6  x7  y7  x8  y8
x9  y9  x10 y10 x11 y11 x12 y12

score 2 · Accepted Answer

如果标题在整个文件中重复，那么这可能有效：

#!/usr/bin/awk -f

{
    m = NR % 5
    for (i = 1; i <= 4; ++i) {
        a[m, i] = $i
    }
}
m == 0 {
    printf("%8s %8s %8s %8s %8s %8s %8s %8s\n", "" a[1, 1], "space", a[1, 2], "space", a[1, 3], "space", a[1, 4], "space")
    printf("%8s %8s %8s %8s %8s %8s %8s %8s\n", a[2, 1], a[3, 1], a[2, 2], a[3, 2], a[2, 3], a[3, 3], a[2, 4], a[3, 4])
    printf("%8s %8s %8s %8s %8s %8s %8s %8s\n", a[4, 1], a[0, 1], a[4, 2], a[0, 2], a[4, 3], a[0, 3], a[4, 4], a[0, 4])
}

会打印这个：

 200.023    space  468.865    space  567.976    space  647.711    space
  59.927  1576.77   58.099  2192.02   60.546  1630.22   61.461  1879.04
  59.769  1111.36   59.300   2674.2   60.380  1590.89   61.308  970.134

我实际上是按字面意思放置"space"的，但您可以将它们替换为"".

score 1 · Accepted Answer

这是一个 awk 单行代码：

$ awk 'NR==1{gsub("\t","\tspace\t");print;next}!(NR%2){split($0,a,"\t")}NR%2{for (i=1;i<=NF;i++) printf "%s\t%s\t", a[i], $i;print ""} ' file 
200.03  space   468.865 space   567.976 space   647.711
59.927  1576.77 58.099  2192.02 60.546  1630.22 61.461  1879.04 
59.769  1111.36 59.300  2674.2  60.380  1590.89 61.308  970.134

或者以更易读的格式：

$ awk '
    NR==1{
        gsub("\t","\tspace\t")
        print
        next
    }
    !(NR%2){
        split($0,a,"\t")
    }
    NR%2{
        for (i=1;i<=NF;i++) 
            printf "%s\t%s\t", a[i], $i
            print ""
    } ' file

unix - 在 awk 中操作列

3 回答 3

Related

Reference