awk - How to handle 3 files with awk?

Question

Ok, so after spending 2 days, I am not able solve it and I am almost out of time now. It might be a very silly question, so please bear with me. My awk script does something like this:

BEGIN{ n=50; i=n; }
FNR==NR {
            # Read file-1, which has just 1 column
            ids[$1]=int(i++/n);
            next
        }
        {
            # Read file-2 which has 4 columns
            # Do something
            next
        }
 END {...}

It works fine. But now I want to extend it to read 3 files. Let's say, instead of hard-coding the value of "n", I need to read a properties file and set value of "n" from that. I found this question and have tried something like this:

BEGIN{ n=0; i=0; }
FNR==NR {
            # Block A
            # Try to read file-0
            next
        }
        {
            # Block B
            # Read file-1, which has just 1 column
            next
        }
        {
            # Block C
            # Read file-2 which has 4 columns
            # Do something
            next
        }
 END {...}

But it is not working. Block A is executed for file-0, I am able to read the property from properties files. But Block B is executed for both files file-1 and file-2. And Block C is never executed.

Can someone please help me solve this? I have never used awk before and the syntax is very confusing. Also, if someone can explain how awk reads input from different files, that will be very helpful.

Please let me know if I need to add more details to the question.

score 11 · Accepted Answer

如果您有 gawk，只需测试 ARGIND：

awk '
ARGIND == 1 { do file 1 stuff; next }
ARGIND == 2 { do file 2 stuff; next }
' file1 file2

如果您没有 gawk，请获得它。

在其他 awks 中，您可以只测试文件名：

awk '
FILENAME == ARGV[1] { do file 1 stuff; next }
FILENAME == ARGV[2] { do file 2 stuff; next }
' file1 file2

如果您想两次解析同一个文件，那只会失败，如果是这种情况，您需要添加该文件被打开次数的计数。

score 7 · Accepted Answer

更新：只要所有输入文件都是非空的，下面的解决方案就可以工作，但请参阅@Ed Morton 的答案，了解添加文件特定处理的更简单和更强大的方法。

但是，这个答案仍然对一些awk基础知识以及为什么 OP 的方法不起作用提供了一个有希望的有用解释。

尝试以下操作（请注意，我已将索引设为1 -based，因为它是这样awk做的）：

awk '

 # Increment the current-file index, if a new file is being processed.
 FNR == 1 { ++fIndex }

 # Process current line if from 1st file.
 fIndex == 1 {
    print "file 1: " FILENAME
    next
 }

 # Process current line if from 2nd file.
 fIndex == 2 {
    print "file 2: " FILENAME
    next
 }

 # Process current line (from all remaining files).
 {
    print "file " fIndex ": " FILENAME
 }

' file-1 file-2 file-3

FNR==1只要开始处理新FNR的输入文件（包含输入文件相对的行号），模式就为真。
每次新文件开始处理时，fIndex都会递增，从而反映当前输入文件的从 1 开始的索引。向@twalberg 的有用答案致敬。
- 请注意，awk数字上下文中使用的未初始化变量默认为0，因此无需初始化fIndex（除非您想要不同的起始值）。

诸如此类的模式fIndex == 1可用于仅对来自特定输入文件的行执行块（假设块以结尾next）。
然后对所有没有文件特定块的输入文件执行最后一个块（上图）。

至于为什么你的方法不起作用：

对于来自所有输入文件的行，您的第二个和第三个块可能会无条件地执行，因为它们前面没有模式（条件）。
因此，从所有后续输入文件的行中输入您的第二个块，然后它的next语句会阻止到达第三个块。

潜在的误解：

也许您认为每个块都作为一个循环处理单个输入文件。这不是awk工作原理。相反，整个awk程序在一个循环中处理，每次迭代处理一个输入行，从文件 1 中的所有行开始，然后从文件 2 开始，...
一个awk程序可以有任意数量的块（通常以模式开头），并且它们是否针对当前输入行执行完全取决于模式是否评估为真；如果没有模式，则无条件执行该块（跨输入文件）。但是，正如您已经发现的那样，next块内部可用于跳过后续块（模式块对）。

score 1 · Accepted Answer

也许您需要考虑添加一些额外的结构，如下所示：

BEGIN { file_number=1 }
FNR==1 { ++file_number }
file_number==3 && /something_else/ { ...}

awk - How to handle 3 files with awk?

3 回答 3

Related

Reference