regex - 正则表达式中的问号

Question

我正在阅读正则表达式参考，我正在考虑？和？？人物。你能用一些例子解释一下它们的用处吗？我对他们不够了解。

谢谢你

score 65 · Accepted Answer

这是一个很好的问题，我自己花了一段时间才明白惰性??量词的意义所在。

? - 可选（贪婪）量词

的用处?很容易理解。如果你想同时找到httpand https，你可以使用这样的模式：

https?

此模式将匹配两个输入，因为它是s可选的。

?? - 可选（惰性）量词

??更微妙。它通常做同样的事情?。当你问： “这个输入满足这个正则表达式吗？”时，它不会改变真/假结果。相反，它与以下问题相关：“此输入的哪一部分与该正则表达式匹配，哪些部分属于哪些组？” 如果输入可以以多种方式满足模式，则引擎将决定如何根据?vs. ??（或*vs.*?或+vs. +?）对其进行分组。

假设您有一组要验证和解析的输入。这是一个（诚然愚蠢的）示例：

Input:       
http123
https456
httpsomething

Expected result:
Pass/Fail  Group 1   Group 2
Pass       http      123
Pass       https     456
Pass       http      something

您尝试想到的第一件事，就是这样：

^(http)([a-z\d]+)$

Pass/Fail  Group 1   Group 2    Grouped correctly?
Pass       http      123        Yes
Pass       http      s456       No
Pass       http      something  Yes

他们都通过了，但你不能使用第二组结果，因为你只想要456第 2 组。

好吧，让我们再试一次。假设第 2 组可以是字母或数字，但不能同时是：

(https?)([a-z]+|\d+)

Pass/Fail  Group 1   Group 2   Grouped correctly?
Pass       http      123       Yes
Pass       https     456       Yes
Pass       https     omething  No

现在第二个输入没问题，但第三个输入错误，因为?默认情况下是贪婪的（+也是，但?排在第一位）。在决定是还是的一部分s时，如果结果是通过任何一种方式，正则表达式引擎将始终选择左边的那个。所以第 2 组输了，因为第 1 组吃光了。https?[a-z]+|\d+s

为了解决这个问题，你做一个微小的改变：

(https??)([a-z]+|\d+)$

Pass/Fail  Group 1   Group 2    Grouped correctly?
Pass       http      123        Yes
Pass       https     456        Yes
Pass       http      something  Yes

从本质上讲，这意味着：“如果你必须匹配https，但看看当第 1 组只是 . 时这是否仍然通过http。” 引擎意识到s可以作为的一部分工作[a-z]+|\d+，因此它更愿意将其放入第 2 组。

score 53 · Accepted Answer

?和之间的主要区别在于??他们的懒惰。??是懒惰，?不是。

假设您想在正文中搜索单词“car”，但不想仅限于单数“car”；您还想匹配复数“汽车”。

这是一个例句：

I own three cars.

现在，如果我想匹配“car”这个词，而我只想得到字符串“car”作为回报，我会??像这样使用惰性：

cars??

这就是说，“寻找汽车或汽车这个词；如果找到，则返回car，仅此而已”。

现在，如果我想匹配相同的单词（“car”或“cars”）并且我想得到整个匹配作为回报，我会像这样使用非懒惰?：

cars?

这就是说，“寻找汽车或汽车一词，然后返回汽车或汽车，无论您找到什么”。

在计算机编程的世界中，惰性通常意味着“仅根据需要进行评估”。所以惰性??只返回匹配所需的数量；因为 "cars" 中的 "s" 是可选的，所以不要返回它。另一方面，非惰性（有时称为贪婪）操作尽可能地评估，因此?返回所有匹配项，包括可选的“s”。

就个人而言，我发现自己使用?它作为一种使其他正则表达式运算符变得懒惰的方法（例如*and+运算符）比我将它用于简单的字符可选性时更频繁，但是 YMMV。

在代码中查看

以下是在 Clojure 中实现的上述示例：

(re-find #"cars??" "I own three cars.")
;=> "car"

(re-find #"cars?" "I own three cars.")
;=> "cars"

itemre-find是一个函数，它将其第一个参数作为正则表达式#"cars??"并返回它在第二个参数中找到的第一个匹配项"I own three cars."

score 25 · Accepted Answer

问号在正则表达式中的一些其他用途

除了其他答案中解释的内容外，在正则表达式中还有 3 种问号的用法。

负前瞻

如果你想匹配一些没有被其他东西跟随的东西，则使用负前瞻。负前瞻结构是一对括号，左括号后跟一个问号和一个感叹号。x(?!x2)

例子
- 考虑一个词There
- 现在，默认情况下，RegExe将查找 word 中的第三个e字母There。
```
There
  ^
```
- 但是，如果您不希望ewhich 紧随其后r，则可以使用 RegEx e(?!r)。现在结果将是：
```
There
    ^
```
积极前瞻

积极的前瞻工作是一样的。q(?=u)匹配q紧跟在 a 之后的 a u，而不是u匹配的一部分。正向前瞻结构是一对括号，左括号后跟一个问号和一个等号。

例子
- 考虑一个词getting
- 现在，默认情况下，RegExt将查找 word 中的第三个t字母getting。
```
getting
  ^
```
- 但是，如果您想要twhich 紧随其后i，那么您可以使用 RegEx t(?=i)。现在结果将是：
```
getting
   ^
```
非捕获组

每当您在括号中放置正则表达式时()，它们都会创建一个编号的捕获组。它将与正则表达式部分匹配的字符串部分存储在括号内。

如果您不需要组来捕获其匹配项，则可以将此正则表达式优化为
```
(?:Value)
```

另请参见this和this。

score 15 · Accepted Answer

?只需将前一项（字符、字符类、组）设为可选：

colou?r

匹配“颜色”和“颜色”

(swimming )?pool

匹配 "a pool" 和 "the pool"

??是一样的，但它也是惰性的，所以如果可能的话，该项目将被排除在外。正如那些文档所指出的，?? 在实践中很少见。我从来没有使用过它。

score 1 · Accepted Answer

使用“一次或根本不”匹配的不情愿量词从 Oracle 文档运行测试工具X??表明它可以作为保证始终为空的匹配。

$ java RegexTestHarness

Enter your regex: x?
Enter input string to search: xx
I found the text "x" starting at index 0 and ending at index 1.
I found the text "x" starting at index 1 and ending at index 2.
I found the text "" starting at index 2 and ending at index 2.

Enter your regex: x??
Enter input string to search: xx
I found the text "" starting at index 0 and ending at index 0.
I found the text "" starting at index 1 and ending at index 1.
I found the text "" starting at index 2 and ending at index 2.

https://docs.oracle.com/javase/tutorial/essential/regex/quant.html

它似乎与空匹配器相同。

Enter your regex:     
Enter input string to search: xx
I found the text "" starting at index 0 and ending at index 0.
I found the text "" starting at index 1 and ending at index 1.
I found the text "" starting at index 2 and ending at index 2.

Enter your regex: 
Enter input string to search: 
I found the text "" starting at index 0 and ending at index 0.

Enter your regex: x??
Enter input string to search: 
I found the text "" starting at index 0 and ending at index 0.

regex - 正则表达式中的问号

5 回答 5

? - 可选（贪婪）量词

?? - 可选（惰性）量词

在代码中查看

问号在正则表达式中的一些其他用途

Related

Reference