awk - 试图修改awk代码

Question

awk  'BEGIN{OFS=","} FNR == 1
            {if (NR > 1) {print fn,fnr,nl}
                        fn=FILENAME; fnr = 1; nl = 0}
                        {fnr = FNR}
                        /ERROR/ && FILENAME ~ /\.gz$/ {nl++}
                        {
                            cmd="gunzip -cd " FILENAME
                            cmd; close(cmd)
                         }
            END                    {print fn,fnr,nl}
        ' /tmp/appscraps/* > /tmp/test.txt

以上扫描给定目录中的所有文件。打印文件名、每个文件中的行数以及找到的包含“错误”的行数。

我现在正在尝试使脚本在它读取的任何文件不是常规文件时执行命令。即，如果文件是 gzip 文件，则运行特定命令。

以上是我尝试在其中包含 gunzip 命令并自行完成的尝试。不幸的是，它不起作用。另外，我不能事先“压缩”目录中的所有文件。这是因为并非目录中的所有文件都是“gzip”类型。有些将是常规文件。

所以我需要脚本来处理它找到不同方式的任何 .gz 文件，以便它可以读取它，计算和打印其中的行数，以及它找到的与提供的模式匹配的行数（就像它会该文件是一个普通文件）。

有什么帮助吗？

score 1 · Accepted Answer

这部分脚本没有意义：

        {if (NR > 1) {print fn,fnr,nl}
                    fn=FILENAME; fnr = 1; nl = 0}
                    {fnr = FNR}
                    /ERROR/ && FILENAME ~ /\.gz$/ {nl++}

让我对其进行一些重组并对其进行评论，以便更清楚地了解它的作用：

{ # for every line of every input file, do the following:

    # If this is the 2nd or subsequent line, print the values of these variables:
    if (NR > 1) {
         print fn,fnr,nl
    } 

    fn = FILENAME    # set fn to FILENAME. Since this will occur for the first line of
                     # every file, this is that value fn will have when printed above,
                     # so why not just get rid of fn and print FILENAME?

    fnr = 1          # set fnr to 1. This is immediately over-written below by
                     # setting it to FNR so this is pointless.

    nl = 0

}
{ # for every line of every input file, also do the following
  # (note the unnecessary "}" then "{" above):

    fnr = FNR        # set fnr to FNR. Since this will occur for the first line of
                     # every file, this is that value fnr will have when printed above,
                     # so why not just get rid of fnr and print FNR-1?
} 

/ERROR/ && FILENAME ~ /\.gz$/ {

    nl++             # increment the value of nl. Since nl is always set to zero above,
                     # this will only ever set it to 1, so why not just set it to 1?
                     # I suspect the real intent is to NOT set it to zero above.

}

您还可以使用上面的代码测试以“.gz”结尾的文件名，但随后您将在下一个块中的每个文件上运行 gunzip。

除此之外，就像其他人也建议的那样，只需从 shell 调用 gunzip 即可。awk 是一种用于解析文本的工具，它不是调用其他工具的环境——这就是 shell 的用途。

例如，假设您的注释 ( prints the file name, number of lines in each file and number of lines found containing 'ERROR) 准确地描述了您希望 awk 脚本执行的操作，并假设使用 awk 直接在“.gz”文件中测试单词“ERROR”是有意义的：

for file in /tmp/appscraps/*.gz
do
    awk -v OFS=',' '/ERROR/{nl++} END{print FILENAME, NR+0, nl+0}' "$file"
    gunzip -cd "$file"
done > /tmp/test.txt

更清晰，更简单，不是吗？

如果直接在“.gz”文件中测试单词 ERROR 没有意义，那么您可以这样做：

for file in /tmp/appscraps/*.gz
do
    zcat "$file" | awk -v file="$file" -v OFS=',' '/ERROR/{nl++} END{print file, NR+0, nl+0}'
    gunzip -cd "$file"
done > /tmp/test.txt

如您现在在下面的评论中描述的那样处理 gz 和非 gz 文件：

for file in /tmp/appscraps/*
do
    case $file in
        *.gz ) cmd="zcat" ;;
        * )    cmd="cat"  ;;
    esac

    "$cmd" "$file" |
        awk -v file="$file" -v OFS=',' '/ERROR/{nl++} END{print file, NR+0, nl+0}'

done > /tmp/test.txt

我遗漏了gunzip，因为据我从您声明的要求中可以看出您不需要它。如果我错了，请解释您需要它的用途。

score 1 · Accepted Answer

我认为它可能比这更简单。

通过 shell 扩展，您已经有了文件名（因此您可以打印它）。因此，您可以对所有文件进行循环，并为每个文件执行以下操作：

打印文件名
zgrep -c ERROR $file（输出包含'ERROR'的行数）
zcat $file|wc -l （这将输出行号）

zgrep 和 zcat 对纯文本文件和 gzip 文件都有效。

假设您在路径/文件名中没有任何空格：

for f in /tmp/appscraps/* 
do
   n_lines=$(zcat "$f"|wc -l)
   n_errors=$(zgrep -c ERROR "$f")
   echo "$f $n_lines $n_errors"
done

这是未经测试的，但它应该可以工作。

score 0 · Accepted Answer

您可以对每个文件执行以下命令：

gunzip -t FILENAME; echo $?

它将通过打印退出代码 0（用于 gzip 文件）或 1（损坏/其他文件）。现在您可以使用 IF 比较输出以执行所需的处理。

awk - 试图修改awk代码

3 回答 3

Related

Reference