regex - （相当于）“字符类中的反向引用”的一般方法？

Question

在 Perl 正则表达式中，像\1,\2等表达式通常被解释为对先前捕获的组的“反向引用”，但当 , 等出现在字符类中时则\1不然\2。在后一种情况下， the\被视为转义字符（因此\1只是1等）。

因此，如果（例如）想要匹配一个字符串（长度大于 1），该字符串的第一个字符与其最后一个字符匹配，但没有出现在字符串中的任何其他位置，则以下正则表达式将不会：

/\A       # match beginning of string;
 (.)      # match and capture first character (referred to subsequently by \1);
 [^\1]*   # (WRONG) match zero or more characters different from character in \1;
 \1       # match \1;
 \z       # match the end of the string;
/sx       # s: let . match newline; x: ignore whitespace, allow comments

不起作用，因为它匹配（例如）字符串'a1a2a'：

  DB<1> ( 'a1a2a' =~ /\A(.)[^\1]*\1\z/ and print "fail!" ) or print "success!"
fail!

我通常可以设法找到一些解决方法¹，但它总是针对特定问题，并且通常比我可以在字符类中使用反向引用时要复杂得多。

是否有一个通用的（并且希望是简单的）解决方法？

_{¹例如，对于上面示例中的问题，我会使用类似}

/\A
 (.)              # match and capture first character (referred to subsequently
                  # by \1);
 (?!.*\1\.+\z)    # a negative lookahead assertion for "a suffix containing \1";
 .*               # substring not containing \1 (as guaranteed by the preceding
                  # negative lookahead assertion);
 \1\z             # match last character only if it is equal to the first one
/sx

_{...我已经用[^\1]*更令人生畏的否定前瞻断言替换了早期正则表达式中相当简单的（虽然，唉，不正确的）子表达式(?!.*\1.+\z)。这个断言基本上是说“如果\1出现在这一点之外的任何地方（除了最后一个位置），就放弃。” 顺便说一句，我给出这个解决方案只是为了说明我在问题中提到的那种解决方法。我并不认为这是一个特别好的。}

score 14 · Accepted Answer

这可以通过重复组中的负前瞻来完成：

/\A         # match beginning of string;
 (.)        # match and capture first character (referred to subsequently by \1);
 ((?!\1).)* # match zero or more characters different from character in \1;
 \1         # match \1;
 \z         # match the end of the string;
/sx

即使组包含多个字符，也可以使用此模式。

regex - （相当于）“字符类中的反向引用”的一般方法？

1 回答 1

Related

Reference