1

Having never used awk before on Linux I am attempting to understand how it matches regular expressions. For example in the past based on my experience the regular expression /2/ would match 2 in all of the following lines.

  1. This will match 2
  2. This will not match 2

Now if I run the command awk '{if(NR~2)print}' sample.txt which has the contents

  1. 2 will be matched
  2. This will not match 2
  3. 2 may be matched

The line that is matched is This will not match 2 which indicates it is matching the line 2 because if I replace the command with awk '{if(NR~3)print}' sample.txt it matches 2 may be matched. Now if I also run the command awk '{if(NR~/^2$/)print}' sample.txt, the matches the same exact line i.e. line 2.

However the source I am referring to at http://www.youtube.com/watch?feature=player_detailpage&v=Htnno4CHVus#t=502s seems to indicate otherwise.

What am I missing and how is the command awk '{if(NR~2)print}' sample.txt different to that of awk '{if(NR~/^2$/)print}' sample.txt?

4

2 回答 2

4

The condition NR~2 is checking whether the record number, NR, matches 2. For a 2 or 3 line input file, the expression is equivalent to:

if (NR == 2)

Similarly with NR~3, of course. Try:

awk '/2/'

That will print all lines where the text of the line ($0) contains a 2. By default, a regular expression matches against the whole line; you could limit it to a particular field with $3 ~ /3/, for example.

An awk program consists of patterns and actions, where either the pattern or the action may be omitted.

awk '{ if ($0 ~ /2/) print }
     /2/
     /2/ { if ($0 ~ /a.*z/) print "Matches a.*z"; }'

The first line has no pattern; the action in the { ... } is executed for each input line (but only some input lines will generate output because of the conditional. All lines that contain a 2 will be printed. (If there is no argument to print, it prints $0 followed by a newline.)

The second line has a pattern but no action; all lines that contain a 2 will be printed again. (The missing action is equivalent to { print }.)

The third line has both a pattern and an action; all lines that both contain a 2 and also contain an 'a' followed by a 'z' will be remarked upon.


How are these two commands different?

 `awk '{if(NR~2)print}' sample.txt`
 `awk '{if(NR~/^2$/)print}' sample.txt`

The first command will print line numbers 2, 12, 20..29, 32, 42, ... 102, 112, 120..129, ... 200..299, ...; all lines where the line number contains a 2.

The second command will print only line number 2 because the /^2$/ constrains the value to contain start of string, digit 2 and end of string.


I take it that means that the source is wrong?

Now I've looked at the YouTube resource, I think you must have misunderstood what it is trying to teach. When it talks about {if (NR~2) print}, it should be saying it will print any line number which contains a 2; the video cites line numbers 2, 12, 20, 21, 22, etc. It should not be saying any line which contains a 2; I think the video does say that, but the video misspoke (but the text was accurate). The comparison against NR is not actually wrong, but it is aconventional; I'm not sure that I'd include regexes against NR in an introductory video describing awk. So, the video appears to have a glitch in the audio but the text on screen is accurate, I think. I may still have missed something.


The command awk '{ if ($0 ~ /2/) print } against the file say sample.txt with the contents I mentioned would only result in the output 2 will be matched. Is that correct?

That command, given the input:

2 will be matched
This will not match 2
2 may be matched

will print all three lines; they all contain the digit 2.

I also thought that the action was print and the pattern was $0 ~ /2/.

No; the pattern was empty (because there was nothing before the open brace) — so all lines match it — and the action was the part in braces { if ($0 ~ /2/) print }. Now, the action contains a conditional, but that's a separate issue.

Now the command awk '/2/' sample.txt would print all three lines. Is that correct?

Yes.

于 2012-06-16T22:11:45.693 回答
1

NR means the Number of the Record being processed... You are matching against line number 2.

于 2012-06-16T22:11:34.833 回答