2

好的正则表达式大师,我有一个很长的文本,我正在尝试在包含“他说”和类似变体的句子中添加引号。

例如:

s = 'This should have no quotes. This one should he said. But this one should not. Neither should this. But this one should she said.'

应该导致:

This should have no quotes. "This one should," he said. But this one should not. Neither should this. "But this one should," she said.

到目前为止,我可以非常接近,但并不完全正确:

>>> import re
>>> m = re.sub(r'\.\W(.*?) (he|she|it) said.', r'. "\1," \2 said.', s)

结果是:

>>> print m
This should have no quotes. "This one should," he said. But this one should not. "Neither should this. But this one should," she said.

如您所见,它在第一个实例周围正确地加上了报价,但在第二个实例中放置得太早了。任何帮助表示赞赏!

4

1 回答 1

2

评论中指出了一些不同的有效情况,但为了解决您面临的问题:

它引用了整个句子,因为它看到了结尾的句点one should not.。您真正想要的是仅引用上一时期。因此,请确保在匹配的括号中不包含句点,如下所示:

m = re.sub(r'\.\W([^\.]*?) (he|she|it) said.', r'. "\1," \2 said.', s)

对于句子中带有句点的事物,这将失败,"Dr. Seuss likes to eat, she said"但这是另一个问题。

于 2013-11-07T00:51:05.977 回答