regex - 获取前后的句点和单词（具有重叠匹配）

Question

我想在某些文本中获取所有句点以及周围的单词。下面的文本可以是一个例子：

本研究旨在设计利用丁香叶油的丁香酚生产异丁香酚和香草醛，并对潜在的产品开发进行财务分析。本研究工作的具体目标是： 1. 鉴定异丁香酚和香兰素。2.异丁香酚和香兰素工艺设计的模型模拟。3、财务可行性及附加值研究。这项研究有望提供丁香酚的最大经济潜力，以提高丁香叶油的附加值。结果表明，FTIR和NMR产物证实合成产物中存在的异丁香酚和香草醛与参考标准相同。

当我使用模式时

\w+\.\s\w+

在上面的字符串上，它匹配（来自 section and vanillin. 2. Model simulation）vanillin. 2但它跳过了2. Model.

我希望它同时匹配vanillin. 2和2. Model。

你能给我一些改进，让我得到所有的时期吗？

score 2 · Accepted Answer

将肯定的前瞻断言与捕获组一起使用：

(?=(\b\w+\.(?:\s+\w+|$)))

按如下方式使用它：

preg_match_all('/(?=(\b\w+\.(?:\s+\w+|$)))/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[1];

解释：

(?=       # Assert that the following can be matched at the current position:
 (        # Capture into group number 1:
  \b      # - Beginning of a word
  \w+     # - an alphanumeric word
  \.      # - a dot
  (?:     # - Then either...
   \s+\w+ #   - whitespace and another word
  |       # - or... 
   $      #   - the end of the string.
  )       # End of alternation
 )        # End of capturing group 1
)         # End of lookahead

在 regex101.com 上查看它的实际应用。

regex - 获取前后的句点和单词（具有重叠匹配）

1 回答 1

Related

Reference