python - Python：句子拆分产生空格

Question

所以我有一些句子，例如：

The window is over there. The lamp is on. The fire is burning.

当我使用 split('.') 拆分它然后用换行符加入它时，它会丢失“。”

然后我尝试了正则表达式，(?<=\.)\s但它在第二个和第三个字母的第一个字母之前产生了一个空格：

The window is over there.
 The lamp is on.
 The fire is burning.

我不想要那个额外的空间。我想：

The window is over there.
The lamp is on.
The fire is burning.

谢谢

score 3 · Accepted Answer

>>> test = "The window is over there. The lamp is on. The fire is burning."
>>> print test.replace(". ",".\n")
The window is over there.
The lamp is on.
The fire is burning.

score 3 · Accepted Answer

3

".\n".join(i.strip() for i in a.split("."))

于 2013-01-13T20:50:00.820 回答

score 1 · Accepted Answer

显然不处理特殊情况（即一段时间后没有空格），为什么不这样做：

>>> s = 'The window is over there. The lamp is on. The fire is burning.'
>>> print s.replace('. ', '.\n')
The window is over there.
The lamp is on.
The fire is burning.

score 1 · Accepted Answer

有几种处理拆分输入的方法：拆分后剥离、使用正则表达式拆分或使用简单搜索。

第一个选项可能是最直观的：您将字符串拆分为一个点，就像您已经做的那样，然后剥离结果字符串以删除任何空格并恢复尾随点。在 Python 中：

sentences = input.split('.')
sentences = [s.strip() + '.' for s in sentences if s]
print sentences.join('\n')

A second and simpler approach is to simple replace '. ' with '.\n':

print input.replace('. ', '.\n')

This will work with your input, but will fail if someone uses two spaces to separate sentences (which some people prefer).

The final and most flexible approach is to use a regular expression to split on the combination of a dot and whitespace:

import re
sentences = re.split('(?<=\.)\s*', input)
print sentences.join('\n')

Notice the important difference with your regular expression: I used \s* to consume all possible whitespace. This matters in cases where there are two or more spaces, or none at all.

python - Python：句子拆分产生空格

4 回答 4

Related

Reference