linux - 令人费解的 egrep 匹配换行符

Question

我对以下 egrep 行为感到非常困惑：

我有一个以 LF 结尾的文件。$'\n'当我按预期返回所有行的grep 时。$'\r\n'但是，即使文件中没有回车符，当我 grep for 时也会返回所有行。为什么 grep 会以这种令人费解的方式表现？

[pjanowsk@krakow myplay2]$ cat sample.txt
a
b
n
c
[pjanowsk@krakow myplay2]$ file sample.txt
sample.txt: ASCII text
[pjanowsk@krakow myplay2]$ egrep $'\n' sample.txt 
a
b
n
c
[pjanowsk@krakow myplay2]$ egrep $'\r\n' sample.txt 
a
b
n
c

此外，当我将文件转换为 CRLF 终止时，egreping 换行符匹配所有行，但 egreping 回车+换行符返回空字符串。为什么？

[pjanowsk@krakow myplay2]$ unix2dos sample.txt 
unix2dos: converting file sample.txt to DOS format ...
[pjanowsk@krakow myplay2]$ file sample.txt 
sample.txt: ASCII text, with CRLF line terminators
[pjanowsk@krakow myplay2]$ egrep $'\n' sample.txt 
a
b
n
c
[pjanowsk@krakow myplay2]$ egrep $'\r\n' sample.txt 




[pjanowsk@krakow myplay2]$

最后，如果我'\n'使用强引号但没有 C 样式转义的 egrep，即使没有反斜杠，我也会得到“n”的匹配项。为什么？

[pjanowsk@krakow myplay2]$ egrep '\n' sample.txt 
n

score 1 · Accepted Answer

第一个 egrep 正在返回每一行，因为您的 shell 将 $'\n' 视为名为 '\n' 的变量。该变量的计算结果为空字符串，因此 egrep 看到“egrep '' sample.txt”。这将返回所有行。

我不认为 grep 或 egrep 允许匹配行尾字符本身。他们使用 EOL 将文件分成匹配或不匹配的行。

您可以使用 pcregrep，它将使用“与 perl 兼容”的正则表达式，并且会愉快地匹配多行正则表达式。

score 0 · Accepted Answer

可以尝试其中之一

  -U, --binary              do not strip CR characters at EOL (MSDOS)
  -u, --unix-byte-offsets   report offsets as if CRs were not there (MSDOS)

linux - 令人费解的 egrep 匹配换行符

2 回答 2

Related

Reference