regex - 正则表达式用英语解释

Question

我看过这里，据我了解，以下正则表达式仅表示“任何 unicode 字符序列”。有人可以确认一下吗？

当前正则表达式：/^(?>\P{M}\p{M}*)+$/u

另外，如果我阅读手册，它会说

a) \P{M} = \PM

b) (?>\PM\pM*) = \X

因此，有了这两件事，我不能将正则表达式简化为吗？：

建议的正则表达式：/^\X+$/u

到现在我还是不太明白...

score 2 · Accepted Answer

^            # start of string followed by 
(?>          # an independent (non-backtracking) capturing group containing 
    \P{M}    # a single unicode character which is not in the `Mark` category
    \p{M}*   # 0 or more characters in the `Mark` category
)+           # with this capturing group repeated 1 or more times
$            # the end-of-line

鉴于^\X+$不包含捕获组；在\P{M}\p{M}*其他方面是等价的。

score 2 · Accepted Answer

是的，\P{M}\p{M}*可以简化为\X，但并非所有语言都支持\X（根据我的经验）\P{M}并且\p{M}更频繁地支持。

例如，Java 和 .NET 的正则表达式引擎不支持\X（Perl 支持，当然……）。

更多信息，请参阅：http ://www.regular-expressions.info/unicode.html

regex - 正则表达式用英语解释

2 回答 2

Related

Reference