python - python中单词排除的正则表达式

Question

'[\w_-]+'我有一个允许字母数字字符或下划线的正则表达式。

我在 python 列表中有一组我不想允许的单词

listIgnore = ['summary', 'config']

需要在正则表达式中进行哪些更改？

PS：我是正则表达式的新手

score 3 · Accepted Answer

>>> line="This is a line containing a summary of config changes"
>>> listIgnore = ['summary', 'config']
>>> patterns = "|".join(listIgnore)
>>> print re.findall(r'\b(?!(?:' + patterns + r'))[\w_-]+', line)
['This', 'is', 'a', 'line', 'containing', 'a', 'of', 'changes']

score 2 · Accepted Answer

这个问题引起了我的兴趣，所以我开始回答：

'^(?!summary)(?!config)[\w_-]+$'

现在这仅在您想将正则表达式与完整字符串匹配时才有效：

>>> re.match('^(?!summary)(?!config)[\w_-]+$','config_test')
>>> (None)
>>> re.match('^(?!summary)(?!config)[\w_-]+$','confi_test')
>>> <_sre.SRE_Match object at 0x21d34a8>

因此，要使用您的列表，只需在您的正则表达式中(?!<word here>)为每个单词添加更多内容。^这些称为前瞻。这里有一些很好的信息。

如果您尝试在字符串中匹配（即没有^and $），那么我不确定它是否可能。例如，正则表达式只会选择不匹配的字符串子集。示例：ummary对于summary.

显然，您选择的排除越多，效率就越低。可能有更好的方法来做到这一点。

python - python中单词排除的正则表达式

2 回答 2

Related

Reference