bash - Apply an gawk script to multiple files in a folder

Question

I would like to use the following awk line to remove every even line (and keep the odd lines) in a text file.

awk 'NR%2==1' filename.txt > output

The problem is that I struggle to either loop properly in awk or build a shell script to apply this to all *.txt fies in a folder. I tried to use this one-liner

gawk 'FNR==1{if(o)close(o);o=FILENAME;
sub(/\.txt/,"_oddlines.txt",o)}{NR%2==1; print>o}'

but that didn't remove the even lines. And I am even less familiar with shell scripting. I use gawk under win7 or cygwin with bash. Many thanks for any kind of idea.

score 3 · Accepted Answer

Your existing gawk one-liner is really close. Here it is formatted as a more readable script:

FNR == 1 {
    if (o)
        close(o)
    o = FILENAME
    sub(/\.txt/, "_oddlines.txt", o)
}
{
    NR % 2 == 1
    print > o
}

This should make the error obvious¹. So now we remove that error:

FNR == 1 {
    if (o)
        close(o)
    o = FILENAME
    sub(/\.txt/, "_oddlines.txt", o)
}
NR % 2 == 1 {
    print > o
}

$ awk -f foo.awk *.txt

and it works (and of course you can re-one-line-ize this).

(Normally I would do this with a for like the other answers, but I wanted to show you how close you were!)

¹Per comment, maybe not quite so obvious?

Awk's basic language construct is the "pattern-action" statement. An awk program is just a list of such statements. The "pattern" is so named because originally they were mostly grep-like regular expression patterns:

$ awk '/^be.*st$/' < /usr/share/dict/web2
beanfeast
beast
[snip]

(Except for the slashes, this is basically just running grep, since it uses the default action, print.)

Patterns can actually contain two addresses, but it's more typical to use one, as in these cases. Patterns not enclosed within slashes allow tests like FNR == 1 (File-specific Number of this Record equals 1) or NR % 2 == 1 (Number of this Record—cumulative across all files!—mod 2 equals 1).

Once you hit the open brace, though, you're into the "action" part. Now NR % 2 == 1 simply calculates the result (true or false) and then throws it away. If you leave out the "pattern" part entirely, the "action" part is run on every input line. So this prints every line.

Note that the test NR % 2 == 1 is testing the cumulative record-number. So if some file has an odd number of lines ("records"), the next file will print out every even-numbered line (and this will persist until you hit another file with an odd number of lines).

For instance, suppose the two input files are A.txt and B.txt. Awk starts reading A.txt and has both FNR and NR set to 1 for the first line, which might be, e.g., file A, line 1. Since FNR == 1 the first "action" is done, setting o. Then awk tests the second pattern. NR is 1, so NR % 2 is 1, so the second "action" is done, printing that line to A_oddlines.txt.

Now suppose file A.txt contains only that one line. Awk now goes on to file B.txt, resetting FNR but leaving NR cumulative. The first line of B might be file B, line 1. Awk tries the first "pattern", and indeed, FNR == 1 so this closes the old o and sets up the new one.

But NR is 2, because NR is cumulative across all input files. So the second pattern (NR % 2 == 1) computes 2 % 2 (which is 0) and compares == 1 which is false, and thus awk skips the second "action" for line 1 of file B.txt. Line 2, if it exists, will have FNR == 2 and NR == 3, so that line will be copied out.

(I originally assumed, since your script was close to working, that you intended this and were just stuck a bit on syntax.)

score 3 · Accepted Answer

With GNU awk you could just do:

$ awk 'FNR%2{print > (FILENAME".odd")}' *.txt

This will create a .odd file for every .txt file in the current directory containing only the odd lines.

However sed has the upper hand on conciseness here. The following GNU sed command will remove all even lines and store the old file with the extension .bck for all .txt files in the current directory:

$ sed -ni.bck '1~2p' *txt

Demo:

$ ls
f1.txt  f2.txt

$ cat f1.txt
1
2
3
4
5

$ cat f2.txt
6
7
8
9
10

$ sed -ni.bck '1~2p' *txt

$ ls
f1.txt  f1.txt.bck  f2.txt  f2.txt.bck

$ cat f1.txt
1
3
5

$ cat f1.txt.bck
1
2
3
4
5

$ cat f2.txt
6
8
10

$ cat f2.txt.bck
6
7
8
9
10

If you don't won't the back up files then simply:

$ sed -ni '1~2p' *txt

score 1 · Accepted Answer

You can try a for loop :

#!/bin/bash

for file in dir/*.txt
do    
   oddfile=$(echo "$file" | sed -e 's|\.txt|_odd\.txt|g')  #This will create file_odd.txt
   awk 'NR%2==1' "$file" > "$oddfile"  # This will output it in the same dir.
done

score 1 · Accepted Answer

Personally, I'd use

for filename in *.txt; do
    awk 'NR%2==1' "$filename" > "oddlines-$filename"
done

EDIT: quote filenames

score 1 · Accepted Answer

Your problem is that NR%2==1 is inside the {NR%2==1; print>o} 'action block' and is not kicking in as a 'condition'. Use this instead:

gawk 'FNR==1{if(o)close(o);o=FILENAME;sub(/\.txt/,"_oddlines.txt",o)};
     FNR%2==1{print > o}' *.txt

bash - Apply an gawk script to multiple files in a folder

5 回答 5

Related

Reference