python - 字符串中的字符替换：如何使用正则表达式？

Question

我在用这个替换之前的空格 - 工作，但是 - 非常不雅的代码：

my_string = "There , are , many , wrong . spaces , before interpunction  marks !"

my_string.replace(" ,", ",").replace(" .", ".").replace(" !", "!").replace(" ?", "?")

现在我试图想出一个更优雅的解决方案，即正则表达式。但我得到的只是：

import re
my_string = re.sub(r"[\s]+[,.!?]", XXX, my_string)

我只是不明白如何让XXX用whitespace+mark相应的mark. 或者如何简单地去除所有标记之前的每个空格 - 这也可以......

score 2 · Accepted Answer

您想为标点符号创建一个捕获组，然后在替换表达式中引用该组：

re.sub(r'\s+([,.!?])', r'\1', my_string)

您不需要在\s;周围使用括号。它已经是一个字符类。

示范：

>>> import re
>>> my_string = "There , are , many , wrong . spaces , before interpunction  marks !"
>>> re.sub(r'\s+([,.!?])', r'\1', my_string)
'There, are, many, wrong. spaces, before interpunction  marks!'

score 1 · Accepted Answer

您需要使用括号捕获标点符号，然后使用\1:

import re
my_string = "There , are , many , wrong . spaces , before interpunction  marks !"
my_string = re.sub(r"[\s]+([,.!?])", r"\1", my_string)
print my_string  # There, are, many, wrong. spaces, before interpunction  marks!

score 0 · Accepted Answer

添加捕获组：

[\s]+([,.!?])

然后在替换中使用它：

\1

\n 表示第 n 个捕获组，\0 表示整个匹配。

score 0 · Accepted Answer

最后一行应该是这样的：

my_string = re.sub(r"\s+([,.!?])", r'\1', my_string)

模式部分中的括号组成了一个组，然后你引用该组，\1因为它是第一个也是唯一的组。

python - 字符串中的字符替换：如何使用正则表达式？

4 回答 4

Related

Reference