python - 获取与正则表达式不匹配的列表？

Question

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r'\w+', DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

我想获得一个单独的列表，列出匹配单词之间的内容：

[", ", " - ", " ", " ", " ", " ", "!?"]

我该怎么做呢？

score 5 · Accepted Answer

print re.findall(r'\W+', DATA)  # note, UPPER-case "W"

产生您正在寻找的列表：

[', ', ' - ', ' ', ' ', ' ', ' ', '!?']

我使用\W+而不是\w+否定您正在使用的字符类。

   \w  Matches word characters, i.e., letters, digits, and underscores.
   \W  Matches non-word characters, i.e., the negated version of \w

此正则表达式参考表可能有助于为您的正则表达式搜索/匹配选择最佳字符类/元字符。此外，请参阅本教程以获取更多信息（尤其是页面底部的参考部分）

score 3 · Accepted Answer

如何使用互补正则表达式来\w, \W？此外，与其获取单独的列表，不如一次性获取所有列表可能更有效。（当然这取决于你打算用它做什么。）

>>> re.findall(r'(\w+)(\W+)', DATA)
[('Hey', ', '), ('you', ' - '), ('what', ' '), ('are', ' '), ('you', ' '), ('doing', ' '), ('here', '!?')]

如果你真的想要单独的列表，只需压缩它：

>>> zip(*re.findall(r'(\w+)(\W+)', DATA))
[('Hey', 'you', 'what', 'are', 'you', 'doing', 'here'), (', ', ' - ', ' ', ' ', ' ', ' ', '!?')]

score 0 · Accepted Answer

import re
DATA = "Hey, you - what are you doing here!?"
print re.split(r'\w+', DATA)
#prints ['', ', ', ' - ', ' ', ' ', ' ', ' ', '!?']

您可能还想过滤掉空字符串以完全匹配您要求的内容。

3 回答 3