python - python, string.replace() 和 \n

Question

（编辑：该脚本似乎适用于这里试图提供帮助的其他人。是因为我正在运行 python 2.7 吗？我真的很茫然......）

我有一个我试图用页面标记的书的原始文本文件。

假设文本文件是：

some words on this line,
1
DOCUMENT TITLE some more words here too.
2
DOCUMENT TITLE and finally still more words.

我正在尝试使用 python 将示例文本修改为：

some words on this line,
</pg>
<pg n=2>some more words here too,
</pg>
<pg n=3>and finally still more words.

我的策略是将文本文件加载为字符串。构建与数字列表相对应的搜索和替换字符串。替换字符串中的所有实例，并写入新文件。

这是我写的代码：

from sys import argv
script, input, output = argv

textin = open(input,'r')
bookstring = textin.read()
textin.close()

pages = []
x = 1
while x<400:
    pages.append(x)
    x = x + 1

pagedel = "DOCUMENT TITLE"

for i in pages:
    pgdel = "%d\n%s" % (i, pagedel)
    nplus = i + 1
    htmlpg = "</p>\n<p n=%d>" % nplus
    bookstring = bookstring.replace(pgdel, htmlpg)

textout = open(output, 'w')
textout.write(bookstring)
textout.close()

print "Updates to %s printed to %s" % (input, output)

该脚本运行没有错误，但它也没有对输入文本进行任何更改。它只是逐个字符地重新打印它。

我的错误与硬回报有关吗？\n? 非常感谢任何帮助。

score 4 · Accepted Answer

在 python 中，字符串是不可变的，因此replace返回替换的输出而不是替换字符串。

你必须这样做：

bookstring = bookstring.replace(pgdel, htmlpg)

你也忘了调用函数close()。看你怎么样了textin.close？你必须用括号来调用它，比如 open：

textin.close()

您的代码对我有用，但我可能会添加更多提示：

输入是一个内置函数，所以也许尝试重命名它。尽管它可以正常工作，但它可能不适合您。
运行脚本时，别忘了加上.txt结尾：
- $ python myscript.py file1.txt file2.txt
确保在测试脚本时清除 file2 的内容。

我希望这些帮助！

score 0 · Accepted Answer

Here's an entirely different approach that uses re(import the re module for this to work):

doctitle = False
newstr = ''
page = 1

for line in bookstring.splitlines():
    res = re.match('^\\d+', line)
    if doctitle:
        newstr += '<pg n=' + str(page) + '>' + re.sub('^DOCUMENT TITLE ', '', line)
        doctitle = False
 elif res:
     doctitle = True
     page += 1
    newstr += '\n</pg>\n'
 else:
    newstr += line

print newstr

Since no one knows what's going on, it's worth a try.

python - python, string.replace() 和 \n

2 回答 2

Related

Reference