python - Python - 在单词后拆分句子，但结果中最多 n 个字符

Question

我想在宽度为 16 个字符的滚动显示器上显示一些文本。为了提高可读性，我想翻阅文本，但不是简单地拆分每 16 个字符，我宁愿在超过 16 个字符限制之前拆分单词或标点符号的每个结尾。

例子：

text = 'Hello, this is an example of text shown in the scrolling display. Bla, bla, bla!'

此文本应转换为最多 16 个字符的字符串列表

result = ['Hello, this is ', 'an example of ', 'text shown in ', 'the scrolling ', 'display. Bla, ', 'bla, bla!']

我从正则表达式开始re.split('(\W+)', text)获取每个元素（单词、标点符号）的列表，但我无法将它们组合起来。

你能帮助我，或者至少给我一些提示吗？

谢谢！

score 17 · Accepted Answer

我会看一下textwrap模块：

>>> text = 'Hello, this is an example of text shown in the scrolling display. Bla, bla, bla!'
>>> from textwrap import wrap
>>> wrap(text, 16)
['Hello, this is', 'an example of', 'text shown in', 'the scrolling', 'display. Bla,', 'bla, bla!']

您可以在TextWrapper中使用很多选项，例如：

>>> from textwrap import TextWrapper
>>> w = TextWrapper(16, break_long_words=True)
>>> w.wrap("this_is_a_really_long_word")
['this_is_a_really', '_long_word']
>>> w = TextWrapper(16, break_long_words=False)
>>> w.wrap("this_is_a_really_long_word")
['this_is_a_really_long_word']

score 3 · Accepted Answer

正如 DSM 建议的那样，查看textwrap. 如果您更喜欢使用正则表达式，以下内容将帮助您了解其中的一部分：

In [10]: re.findall(r'.{,16}\b', text)
Out[10]: 
['Hello, this is ',
 'an example of ',
 'text shown in ',
 'the scrolling ',
 'display. Bla, ',
 'bla, bla',
 '']

（请注意缺少的感叹号和末尾的空字符串。）

score 2 · Accepted Answer

使用正则表达式：

>>> text = 'Hello, this is an example of text shown in the scrolling display. Bla, bla, bla!'
>>> pprint(re.findall(r'.{1,16}(?:\s+|$)', text))
['Hello, this is ',
 'an example of ',
 'text shown in ',
 'the scrolling ',
 'display. Bla, ',
 'bla, bla!']

python - Python - 在单词后拆分句子，但结果中最多 n 个字符

3 回答 3

Related

Reference