bash - 从bash中的字符串打印列

Question

更新的问题 好的，所以我有一个文件，其中包含如下行：

44:)   2.884E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  9.990E+02
45:)   2.884E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  9.990E+02
1:)   3.593E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  1.000E+05
2:)   3.593E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  1.000E+05

第一列中的数字从 1 到 x（在本例中为 45），然后以 1 次重新开始。我想将一些列移动到单独的文件中。我要移动的列的索引存储在变量/数组$selected_columns中（在本例中为 2、5 和 8），而我要移动的列数存储在其中$number_of_columns（在本例中为 3）。

然后我想创建 45 个文件，一个用于 all 的选定列1:)，一个用于 all 的选定列2:)，依此类推。我想让它尽可能通用，因为列数和从 1 到 x 的数字都会改变。数字 x 始终是已知的，并且要提取的列由用户选择。

原始问题：

我有一个由 egrep 获取的字符串。然后我想打印该字符串中的一些列（单词）。该位置（列索引）在我的 bash 脚本的列表中是已知的。目前它看起来像这样：

line=$(egrep " ${i}:\)" $1)

for ((j=1; j<=$number_of_columns; j++))
do
    awk $line -v current_column=${selected_columns[$j]} '{printf $(current_column)}' > "history_files/history${i}"
done

其中number_of_columns是要打印并selected_columns包含这些列的相应索引的列数。例如number_of_columns = 3and selected_columns = [2 5 8]，所以我想将字符串中的字号 2、5 和 8 打印line到文件history${i}中。

我不确定出了什么问题，但这已经通过一些试验和错误来完成。当前的错误是awk: cannot open 0.000E+00 (No such file or directory)。

任何帮助表示赞赏！

score 3 · Accepted Answer

我想，您必须将awk行更改为：

echo $line | awk -v current_column=${selected_columns[$j]} ...

对于您更新的问题，如果列在 array 中$selected_columns。在您的示例文件中，列由多个相邻空格分隔。如果您的原始文件不是这样，您可以省略sed之前的grep.

columns=`echo ${selected_columns[*]} | sed 's/ /,/g'`
for i in `seq 45`; do
    sed -e 's/  */ /g' file | grep "^$i:)" | cut -d' ' -f $columns >file-$i
done

score 1 · Accepted Answer

在：

awk $line -v ...

$line 保存 grep 的输出，这可能不是 awk 期望在它的命令行上看到的。另外，这是：

for ((j=1; j<=$number_of_columns; j++))
do
    anything > "history_files/history${i}"
done

每次循环都会导致您覆盖历史文件。我不知道你在那里真正想要什么。

不过，您的脚本还有许多其他问题。您说“例如 number_of_columns = 3 和 selected_columns = [2 5 8]，所以我想将字符串行中的单词编号 2、5 和 8 打印到文件 history${i}。”。

这在 awk 中完全是微不足道的，你也不需要在 awk 之外做一个“grep”，所以你可以把整个事情做为：

awk -v pat=" ${i}:\)" -v selected_columns="$selected_columns" '

BEGIN { number_of_columns = split(selected_columns,selected_columnsA) }

$0 ~ pat {
    sep=""
    for (j=1;j<=number_of_columns;j++) {
        current_column = selected_columnsA[j]
        printf "%s,%s",sep,lineA[current_column]
        sep = "\t"
    }
    print ""
}
' "$1" > "history_files/history${i}"

如果这对您不起作用，让我们修复它而不是尝试修复原始脚本。听起来您在上述之外有封闭循环，很可能它也可能只是 awk 脚本的一部分。

根据更新的 OP 进行编辑：

我添加了很多评论，但如果您有任何问题，请告诉我：

$ cat file
44:)   2.884E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  9.990E+02
45:)   2.884E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  9.990E+02
1:)   3.593E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  1.000E+05
2:)   3.593E-02  0.000E+00  0.000E+00  2.780E+02  0.000E+00  0.000E+00  1.000E+05
$
$ cat tst.sh
selected_columns=(2 5 8)

selCols="${selected_columns[@]}"

awk -v selCols="$selCols" '

BEGIN { # Executed before the first line of the input file is read

    # Split the string of selected column numbers, selCols, into
    # an array selColsA where selColsA[1] has the value of the
    # first space-separated sub-string of selCols (i.e. the number
    # of the first column to print). Note that we dont need the
    # number of columns passed into the script as a result of
    # splitting the string is the count of elements put into the
    # array as a return code from the split() builtin function.
    numCols = split(selCols,selColsA)
}

{ # Executed once for every line of the input file

    # Create a numerix suffix like "45" from the first column
    # in the current line of the input file, e.g. "45:)" by
    # just getting rid of all non-digit characters.
    sfx = $1
    gsub(/[^[:digit:]]/,"",sfx)

    # Create the name of the output file by attaching that
    # numeric suffix to the base value for all output files.
    #histfile = "history_files/history" sfx
    histfile = "tmp" sfx


    # Loop through every column we want printed. selColsA[<index>]
    # gives us a column number which we can then use to access the
    # columns of the current line. Awk uses the builtin variable $0
    # to hold the current line, and it autolatically splits it so
    # that $1 holds the first column, $2 is the second, etc. So
    # if selColsA[1] has the value 3, then $(selColsA[1]) would be
    # the value of the 3rd column of the current input line.
    sep=""
    for (i=1;i<=numCols;i++) {
        curCol = selColsA[i]

        # Print the current column, prefixed by a tab for all but
        # the first column, and without a terminating newline so the
        # next column gets appended to the end of the current output line.
        # Note that in awk "> file" has different semantics from shell
        # and opens the file for writing the first time the line is hit
        # like "> file" in shell, but then appends to it every time its
        # hit afterwards, like ">> file" in shell.
        printf "%s%s",sep,$curCol > histfile
        sep = "\t"
    }
    # Add a newline to the end of the current output line
    print "" > histfile
}

' "$1"
$
$ ./tst.sh file
$
$ cat tmp1
3.593E-02       2.780E+02       1.000E+05
$ cat tmp2
3.593E-02       2.780E+02       1.000E+05
$ cat tmp44
2.884E-02       2.780E+02       9.990E+02
$ cat tmp45
2.884E-02       2.780E+02       9.990E+02

顺便说一句，我使用上面的“列”和“行”这两个词是为了您的利益，因为您只是在学习，但仅供参考，awk 术语实际上是“字段”和“记录”。

score 0 · Accepted Answer

我认为你可以使用 cut 来做你想做的事情，即

echo "$line" | cut -d" " -f2 -f5 -f8 > "history_files/history${i}"

-d 是你的分隔符，我用空格来测试，因此“”

希望这可以帮助

bash - 从bash中的字符串打印列

3 回答 3

Related

Reference