regex - 正则表达式前瞻 .*

Question

我有一个模式需要找到 string1 的最后一次出现，除非在主题的任何地方找到 string2，然后它需要 string1 的第一次出现。为了解决这个问题，我写了这个低效的负前瞻。

/(.(?!.*?string2))*string1/

运行需要几秒钟（对于缺少任何一个字符串的主题来说，这个时间太长了）。有没有更有效的方法来实现这一点？

score 3 · Accepted Answer

您应该能够使用以下内容：

/string1(?!.*?string2)/

string1只要string2稍后在字符串中找不到，这将匹配，我认为这符合您的要求。

编辑：看到您的更新后，请尝试以下操作：

/.*?string1(?=.*?string2)|.*string1/

score 2 · Accepted Answer

好的，现在我已经明白你想要什么了，有点长但优化得很快：

nutria\d.    -> string1
RABBIT       -> string2

模式（PHP 中的示例）：

$pattern = <<<LOD
~(?J)               # allow multiple capture groups with the same name

  ### capture the first nutria if RABBIT isn't found before ###
  ^ (?>[^Rn]++|R++(?!ABBIT)|n++(?!utria\d.))* (?<res>nutria\d.) 

  ### try to capture the last nutria without RABBIT until the end ###
  (?>
     (?>
        (?> [^Rn]++ | R++(?!ABBIT) | n++(?!utria\d.) )*
        (?<res>nutria\d.)
     )*                         # repeat as possible to catch the last nutria
     (?> [^R]++ | R++(?!ABBIT) )* $  # the end without RABBIT
  )?   # /!\important/!\  this part is optional, then only the first captured
       # nutria is in the result when RABBIT is found in this part

| # OR

  ### capture the first nutria when RABBIT is found before
  ^(?> [^n]++ | n++(?!utria\d.) )*  (?<res>nutria\d.)

~x
LOD;

$subjects = array( 'groundhog nutria1A beaver nutria1B',
                   'polecat nutria2A badger RABBIT nutria2B',
                   'weasel RABBIT nutria3A nutria3B nutria3C',
                   'vole nutria4A marten nutria4B marmot nutria4C RABBIT');

foreach($subjects as $subject) {
    if (preg_match($pattern, $subject, $match))
        echo '<br/>'.$match['res'];
}

该模式旨在使用原子组和具有交替的所有格量词尽可能快地失败，从而使用尽可能少的前瞻来避免灾难性的回溯（仅当找到 an或 an时R，它会很快失败）

score 2 · Accepted Answer

你也可以在你的正则表达式中做 if/else 语句！

(?(?=.*string2).*(string1).*$|^.*?(string1))

解释：

(?                      # If
    (?=.*string2)       # Lookahead, if there is string2
    .*(string1).*$      # Then match the last string1
    |                   # Else
    ^.*?(string1)       # Match the first string1
)

如果string1找到，您将在第 1 组中找到它。

score 1 · Accepted Answer

试试这个正则表达式：

string1(?!.*?string1)|string1(?=.*?string2)

现场演示：http ://www.rubular.com/r/uAjOqaTkYH

正则表达式图片

在 Debuggex 上实时编辑

score 0 · Accepted Answer

尝试使用所有格运算符.*+，它使用更少的内存（它不存储匹配案例的整个回溯）。因此，它也可能运行得更快。

regex - 正则表达式前瞻 .*

5 回答 5

现场演示：http ://www.rubular.com/r/uAjOqaTkYH

Related

Reference