3

我有一个 1000 个字符长的文本字符串,我想将此文本拆分为小于 100 个字符的块,而不拆分整个单词(99 个字符可以,但 100 个不可以)。包装/拆分只能在空格上进行:

例子:

text = "... this is a test , and so on..."
                              ^
                  #position: 100

应拆分为:

newlist = ['... this is a test ,', ' and so on...', ...]

我想得到一个newlist文本列表,该列表被正确拆分为可读(不是单词裁剪)的块。你会怎么做?

4

3 回答 3

3

您可以使用textwrap模块:

In [2]: import textwrap

In [3]: textwrap.wrap("""Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
   ...: tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
   ...: quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
   ...: consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
   ...: cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
   ...: proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
        """, 40)
Out[3]: 
['Lorem ipsum dolor sit amet, consectetur',
 'adipisicing elit, sed do eiusmod tempor',
 'incididunt ut labore et dolore magna',
 'aliqua. Ut enim ad minim veniam, quis',
 'nostrud exercitation ullamco laboris',
 'nisi ut aliquip ex ea commodo consequat.',
 'Duis aute irure dolor in reprehenderit',
 'in voluptate velit esse cillum dolore eu',
 'fugiat nulla pariatur. Excepteur sint',
 'occaecat cupidatat non proident, sunt in',
 'culpa qui officia deserunt mollit anim',
 'id est laborum.']
于 2013-10-18T19:31:24.153 回答
3

使用textwrap模块的wrap函数。下面的示例将行拆分为 10 个字符宽:

In [1]: import textwrap

In [2]: textwrap.wrap("... this is a test , and so on...", 10)
Out[2]: ['... this', 'is a test', ', and so', 'on...']
于 2013-10-18T19:31:30.223 回答
0

像其他人所说的 Wordwrap,但是对于另一种选择:

def splitter(s, n):
    for start in range(0, len(s), n):
        yield s[start:start+n]

data = "abcdefghijabcdefghijabcdefghijabcdefghijabcdefghij"
for splitee in splitter(data, 10):
    print splitee
于 2013-10-18T19:34:55.297 回答