regex - 正则表达式匹配文本中的单词 2 次

Question

我需要匹配在文本中出现 2 次的英文文本中的单词。我试过了

(^|\ )([^\ ][^\b]*\b).*\ \2\b

但这并不匹配所有行。

score 3 · Accepted Answer

您的正则表达式存在一些问题。例如，\b字边界不能在字符类中使用，因此[^\b]*不会按预期工作。

你可能想要类似的东西

(?s)\b(\w+)\b.*\b\1\b

这将匹配从单词第一次出现到最后一次出现的整个文本。这可能不是您真正想要的。

另一个想法：

(?s)\b(\w+)\b.*?\b\1\b

这将仅匹配从单词第一次出现到下一次出现的文本。

这两种方法的问题在于，例如在文本中

foo bar bar foo

正则表达式将匹配 from footo foo，盲目地忽略中间存在重复bar。

因此，如果您确实想查找所有重复出现的单词，请使用

(?s)\b(\w+)\b(?=.*?\b\1\b)

解释：

(?s)       # Allow the dot to match newlines
\b(\w+)\b  # Match an entire word
(?=        # Assert that the following regex can be matched from here:
 .*?       #  Any number of characters
 \b\1\b    #  followed by the word that was previously captured
)          # End of lookahead

regex - 正则表达式匹配文本中的单词 2 次

1 回答 1

Related

Reference