2

python新手,我的程序需要一些帮助。我有一个代码,它接收一个未格式化的文本文档,进行一些格式化(设置页宽和边距),然后输出一个新的文本文档。我的整个代码工作正常,除了这个产生最终输出的函数。

这是问题代码的一部分:

def process(document, pagewidth, margins, formats):
    res = []
    onlypw = []
    pwmarg = []
    count = 0
    marg = 0


    for segment in margins: 

        for i in range(count, segment[0]):
            res.append(document[i])
        text = ''

    foundmargin = -1
    for i in range(segment[0], segment[1]+1):
        marg = segment[2]
        text = text + '\n' + document[i].strip(' ')

    words = text.split()

注意:segment [0] 表示文档的开头,如果您想知道范围,segment[1] 仅表示文档的结尾。我的问题是当我将文本复制到单词时(在 words=text.split() 中)它不会保留我的空行。我应该得到的输出是:

      This is my substitute for pistol and ball. With a
      philosophical flourish Cato throws himself upon his sword; I
      quietly take to the ship. There is nothing surprising in
      this. If they but knew it, almost all men in their degree,
      some time or other, cherish very nearly the same feelings
      towards the ocean with me.

      There now is your insular city of the Manhattoes, belted
      round by wharves as Indian isles by coral reefs--commerce
      surrounds it with her surf.

我当前的输出是什么样的:

      This is my substitute for pistol and ball. With a
      philosophical flourish Cato throws himself upon his sword; I
      quietly take to the ship. There is nothing surprising in
      this. If they but knew it, almost all men in their degree,
      some time or other, cherish very nearly the same feelings
      towards the ocean with me. There now is your insular city of
      the Manhattoes, belted round by wharves as Indian isles by
      coral reefs--commerce surrounds it with her surf. 

我知道当我将文本复制到单词时会出现问题,因为它不会保留空白行。我怎样才能确保它复制空行加上单词?如果我应该添加更多代码或更详细信息,请告诉我!

4

2 回答 2

4

首先拆分至少 2 个换行符,然后拆分单词:

import re

paragraphs = re.split('\n\n+', text)
words = [paragraph.split() for paragraph in paragraphs]

您现在有一个列表列表,每个段落一个;每段处理这些,之后您可以将整个内容重新加入新文本,并重新插入双换行符。

我曾经re.split()支持由超过 2 个换行符分隔的段落;text.split('\n\n')如果段落之间只有 2 个换行符,您可以使用简单的。

于 2013-03-14T20:26:34.833 回答
1

使用正则表达式查找单词空行而不是拆分

m = re.compile('(\S+|\n\n)')
words=m.findall(text)
于 2013-03-14T20:34:07.743 回答