java - 理解正则表达式中的 `+`

Question

我有一个正则表达式可以从推文中删除所有用户名。它看起来像这样：

regexFinder = "(?:\\s|\\A)[@]+([A-Za-z0-9-_]+):";

我试图了解每个组件的作用。到目前为止，我有：

(       Used to begin a “group” element
?:      Starts non-capturing group (this means one that will be removed from the final result)
\\s     Matches against shorthand characters
|       or
\\A     Matches at the start of the string and matches a position as opposed to a character
[@]     Matches against this symbol (which is used for Twitter usernames)
+       Match the previous followed by
([A-Za-z0-9- ]  Match against any capital or small characters and numbers or hyphens

不过，我对最后一点有点迷失了。有人能告诉我 +): 是什么意思吗？我假设括号结束了该组，但我没有得到冒号或加号。

如果我对正则表达式的理解有任何错误，请随时指出！

score 1 · Accepted Answer

实际上，它的+意思是“一个或多个”。

在这种情况下[@]+，表示“一个或多个@符号”并[A-Za-z0-9-_]+表示“一个或多个字母、数字、破折号或下划线”。+是几个量词之一，在此处了解更多信息。

最后的冒号只是确保匹配在匹配结束时有一个冒号。

有时它有助于查看可视化，这是由debuggex生成的：

在此处输入图像描述

score 1 · Accepted Answer

该+符号表示“前一个字符可以重复 1 次或多次”。这与符号相反，*符号表示“前一个字符可以重复0次或更多次”。据我所知，冒号是文字——它匹配字符串中的文字:。

score 1 · Accepted Answer

正则表达式中的加号表示“前一个字符或一组字符出现一次或多次”。由于第二个加号在第二组括号内，它基本上意味着第二组括号匹配由至少一个小写或大写字母、数字或连字符组成的任何字符串。

至于冒号，它在 Java 的 regex 类中没有任何意义。如果你不确定，其他人已经发现了。

score 1 · Accepted Answer

好吧，我们将看到..

[@]+                 any character of: '@' (1 or more times)
   (                 group and capture to \1:
    [A-Za-z0-9-_]+   any character of: (a-z A-Z), (0-9), '-', '_' (1 or more times)
   )                 end of capture group \1
   :                 look for and match ':'

识别以下量词：

*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times

java - 理解正则表达式中的 `+`

4 回答 4

Related

Reference