python - 如何使用正则表达式从文本中查找特定单词并返回所有出现？

Question

就像问题的标题一样。

我是 Python 和正则表达式的新手。因此，我必须从段落中搜索特定单词并显示所有出现索引。

例如：

该段是：

这是一个测试文本，用于测试和测试和测试。

和这个词：

测试

该算法应该返回上一段中三个单词test的非重叠出现的索引（但不是testing，因为我的意思是搜索整个单词，而不仅仅是子字符串）。

具有相同段落和此“单词”的另一个示例：

测试和

该算法应返回 2 次出现的test 和。

我想我必须使用一些正则表达式来找到整个单词的模式，前后是标点符号，例如. , ; ? -

谷歌搜索后，我发现re.finditer应该使用类似的东西，但似乎我还没有找到正确的方法。请帮忙，提前谢谢你。;)

score 6 · Accepted Answer

是的，finditer是要走的路。用于start()查找匹配的索引。

例子：

import re

a="This is a testing text and used to test and test and test."
print [m.start() for m in re.finditer(r"\btest\b", a)]
print [m.start() for m in re.finditer(r"\btest and\b", a)]

输出：

[35, 44, 53]
[35, 44]

score 3 · Accepted Answer

在您的正则表达式中使用单词边界锚\b来指示您希望匹配在单词边界处开始/结束。

>>> sentence = "This is a testing text and used to test and test and test."
>>> pattern = re.compile(r'\btest\b')
>>> [m.start() for m in pattern.finditer(sentence)]
[35, 44, 53]

python - 如何使用正则表达式从文本中查找特定单词并返回所有出现？

2 回答 2

Related

Reference