regex - 正则表达式：两个匹配项之间的负前瞻

Question

我正在尝试构建一个有点像这样的正则表达式：

[match-word] ... [exclude-specific-word] ... [match-word]

这似乎适用于消极的前瞻性，但是当我遇到这样的情况时遇到了问题：

[match-word] ... [exclude-specific-word] ... [match-word] ... [excluded word appears again]

我希望上面的句子匹配，但是第一个和第二个匹配的单词之间的否定前瞻“溢出”，所以第二个单词永远不会匹配。

让我们看一个实际的例子。

我不想匹配每个包含单词“i”和单词“pie”的句子，但不匹配这两个单词之间的单词“hate”。我有这三句话：

i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this

我有这个正则表达式：

^i(?!.*hate).*pie          - have removed the word boundaries for clarity, original is: ^i\b(?!.*\bhate\b).*\bpie\b

匹配第一句，但不匹配第二句，因为否定前瞻扫描整个字符串。

有没有办法限制负前瞻，让它在遇到“仇恨”之前遇到“馅饼”就满足了？

注意：在我的实现中，这个正则表达式后面可能还有其他术语（它是从语法搜索引擎动态构建的），例如：

^i(?!.*hate).*pie.*donuts

我目前正在使用 JRegex，但如有必要可能会切换到 JDK Regex

更新：我忘了在我最初的问题中提到一些东西：

句子中可能存在“否定结构”，如果可能的话，即使“否定”结构存在更远的位置，我也确实希望匹配该句子。

为了澄清，看看这些句子：

i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this
i sure like eating pie, but i like donuts and i hate making pie <- Do want to match this

rob 的答案非常适合这个额外的约束，所以我接受了那个。

score 5 · Accepted Answer

在开始词和停用词之间的每个字符处，您必须确保它与您的否定词或停用词不匹配。像这样（为了便于阅读，我在其中包含了一些空白）：

^i ( (?!hate|pie) . )* pie

这是一个用于测试的python程序。

import re

test = [ ('i sure like eating pie, but i love donuts', True),
         ('i sure like eating pie, but i hate donuts', True),
         ('i sure hate eating pie, but i like donuts', False) ]

rx = re.compile(r"^i ((?!hate|pie).)* pie", re.X)

for t,v in test:
    m = rx.match(t)
    print t, "pass" if bool(m) == v else "fail"

score 3 · Accepted Answer

这个正则表达式应该适合你

^(?!i.*hate.*pie)i.*pie.*donuts

解释

"^" +          // Assert position at the beginning of a line (at beginning of the string or after a line break character)
"(?!" +        // Assert that it is impossible to match the regex below starting at this position (negative lookahead)
   "i" +          // Match the character “i” literally
   "." +          // Match any single character that is not a line break character
      "*" +          // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   "hate" +       // Match the characters “hate” literally
   "." +          // Match any single character that is not a line break character
      "*" +          // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   "pie" +        // Match the characters “pie” literally
")" +
"i" +          // Match the character “i” literally
"." +          // Match any single character that is not a line break character
   "*" +          // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"pie" +        // Match the characters “pie” literally
"." +          // Match any single character that is not a line break character
   "*" +          // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"donuts"       // Match the characters “donuts” literally

score 2 · Accepted Answer

C不匹配...A...B...

测试python：

$ python
>>> import re
>>> re.match(r'.*A(?!.*C.*B).*B', 'C A x B C')
<_sre.SRE_Match object at 0x94ab7c8>

所以我得到了这个正则表达式：

.*\bi\b(?!.*hate.*pie).*pie

regex - 正则表达式：两个匹配项之间的负前瞻

3 回答 3

Related

Reference