regex - perl中单词边界的功能是什么

Question

$string = "Aa Aa 122";

 /\b[A-Z][a-z]*\b/

\b \b => 它是搜索 Aa Aa 的重复项还是搜索不在 [] 中的重复项？

以及下面的正则表达式做什么，

/(.).*\1/

score 3 · Accepted Answer

正则表达式/\b[A-Z][a-z]*\b/搜索仅包含 ASCII 字母字符的大写单词。例子：

Foo B Ba Bar

但不是

bAr FOO foo BAR Føø BäÞ b

这\b是一个零宽度断言，仅匹配单词边界，其中“单词字符”\w和非单词字符相邻。它相当于环顾四周

(?<!\w)(?=\w)|(?<=\w)(?!\w)

\1是捕获组的反向引用，并匹配该组的文字内容。这个正则表达式允许我们匹配foo双引号或单引号引用的字符串：

/(["'])foo\1/  # "foo" and 'foo' but not "foo' or 'foo"

它匹配双引号或单引号并记住哪个。在 foo 之后，该字符必须再次出现。您的正则表达式是一种更通用的形式，foo可以是任何东西，而引用字符可以是任何东西。这会找到由字符包围的第一个最长的非换行符字符串，例如在

:"':foo':"

这匹配:"':foo':，因为它是第一个位置上最长的此类字符串。

score 3 · Accepted Answer

The \b matches a word boundary. So the regular expression matches "Aa" but not "AaB" because there's no word boundary between the "a" and "B" in the second string; that is, they're part of the same word.

See The perl doc on zero-width assertions.

The second one matches a string that contains two of the same character anywhere in it. It works because the \1 matches whatever character was matched by the (.), which matches any single character (that's .) and remembers it as \1 (that's the (...)). The .* means that anything can come in between them ("zero or more characters").

So "abra" matches (\1 is "a"), as does "adder" (\1 is "d", and the .* matches zero characters), but not "black".

regex - perl中单词边界的功能是什么

2 回答 2

Related

Reference