python - Python 正则表达式排除下划线

Question

我需要在 UNICODE 中找到所有两个字符的 sumbols，下划线除外。目前的解决方案是：

pattern = re.compile(ur'(?:\s*)(\w{2})(?:\s*)', re.UNICODE | re.MULTILINE | re.DOTALL)
print pattern.findall('a b c ab cd vs sd a a_ _r')
['ab', 'cd', 'vs', 'sd', 'a_', '_r']

我需要从正则表达式中排除下划线 _，因此找不到a_ AND _r 。问题是，我的角色可以是任何语言。所以我不能像这样使用正则表达式：[^a-zA-Z]。例如，在俄语中：

print pattern.findall(u'ф_')

score 12 · Accepted Answer

12

排除任何非单词 char AND _

[^\W_]

代替

\w

于 2012-09-25T19:35:19.430 回答

score 9 · Accepted Answer

您最好的选择是改用新regex模块。它的功能之一是它可以从字符集中删除字符：

import regex as re

pattern = re.compile(ur'(?:\s*)([\w--_]{2})(?:\s*)', re.UNICODE | re.MULTILINE | re.DOTALL)

该[\w--_]语法创建的字符集\w与从匹配字符中删除的下划线字符相同。

score 0 · Accepted Answer

0

这似乎对我有用：

a="Exclude_from_search"
re.search("(\w[^_]+)", a).group(0)
'Exclude'

于 2018-04-30T17:12:03.367 回答

python - Python 正则表达式排除下划线

3 回答 3

Related

Reference