regex - 作为正则表达式的一部分，如何使用带有字符类的加号？

Question

在 cygwin 中，这不会返回匹配项：

$ echo "aaab" | grep '^[ab]+$'

但这确实会返回匹配项：

$ echo "aaab" | grep '^[ab][ab]*$'
aaab

两种表达方式不一样吗？有没有什么方法可以表达“字符类的一个或多个字符”而无需输入两次字符类（如秒示例）？

根据这个链接，这两个表达式应该是相同的，但也许 Regular-Expressions.info 不包括 cygwin 中的 bash。

score 7 · Accepted Answer

grep有多种匹配“模式”，默认情况下只使用一个基本集合，它不能识别许多元字符，除非它们被转义。您可以将 grep 置于扩展或 perl 模式以+进行评估。

来自man grep：

Matcher Selection
  -E, --extended-regexp
     Interpret PATTERN as an extended regular expression (ERE, see below).  (-E is specified by POSIX.)

  -P, --perl-regexp
     Interpret PATTERN as a Perl regular expression.  This is highly experimental and grep -P may warn of unimplemented features.


Basic vs Extended Regular Expressions
  In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

  Traditional egrep did not support the { meta-character, and some egrep implementations support \{ instead, so portable scripts should avoid { in grep -E patterns and should use [{] to match a literal {.

  GNU  grep -E attempts to support traditional usage by assuming that { is not special if it would be the start of an invalid interval specification.  For example, the command grep -E '{1' searches for the two-character string {1 instead of reporting a syntax
       error in the regular expression.  POSIX.2 allows this behavior as an extension, but portable scripts should avoid it.

或者，您可以使用egrep代替grep -E.

score 6 · Accepted Answer

在基本的正则表达式中，元字符?, +, {, |, (, 和 ) 失去了它们的特殊含义；而是使用反斜杠版本 \?, \+, \{, \|, \(, 和\).

所以使用反斜杠版本：

$ echo aaab | grep '^[ab]\+$'
aaab

或激活扩展语法：

$ echo aaab | egrep '^[ab]+$'
aaab

score 2 · Accepted Answer

用反斜杠屏蔽，或者 egrep 作为扩展的 grep，别名grep -e：

echo "aaab" | egrep '^[ab]+$'

aaab

echo "aaab" | grep '^[ab]\+$'

aaab

regex - 作为正则表达式的一部分，如何使用带有字符类的加号？

3 回答 3

Related

Reference