python - 如何使用 Python 或任何其他编程/脚本语言格式化文本文件？

Question

我想知道如何使用 Python 或任何其他编程/脚本语言来格式化文本文件？

文本文件中的当前格式是这样的：

ABALONE
Ab`a*lo"ne, n. (Zoöl.)

Defn: A univalve mollusk of the genus Haliotis. The shell is lined
with mother-of-pearl, and used for ornamental purposes; the sea-ear.
Several large species are found on the coast of California, clinging
closely to the rocks.

我希望它是这样的（全部在一行上，不包括一些单词等）：

ABALONE : A univalve mollusk of the genus Haliotis. The shell is lined with

score 1 · Accepted Answer

假设格式始终与您描述的完全一样（单词、发音、空行、“Defn：”、定义），那么字符串拆分和连接就是一个简单的问题：

def reformat(text):
    lines = text.split('\n', 3)
    word = lines[0]
    definition_paragraph = lines[3][len('Defn:'):]
    definition_line = definition_paragraph.replace('\n', ' ')
    return word + ' : ' + definition_line

这个想法是制作一段可以轻松调用来修复文本的代码。在这种情况下，该函数被调用reformat，它的工作原理是将给定的文本分成前三行和定义，从段落中提取定义，并将单词本身与定义粘合在一起。

另一种解决方案是正则表达式，它更适合任务，但由于语法奇怪，可能更难理解：

import re
pattern = re.compile('(.+?)\n.+?\n\nDefn: (.+)', re.DOTALL)
def reformat(text):
    word, definition = pattern.search(text).groups()
    return word + ' : ' + definition.replace('\n', ' ')

这应该与上面的其他代码完全相同，但它更简单、更灵活，并且可以移植到不同的语言。

要使用上述任何方法，只需调用将文本作为参数传递的方法。

要替换文件中的文本，您需要打开文件，读取内容，使用上述任何功能重新格式化，然后保存回文件：

with open('word.txt') as open_file:
    text = open_file.read()

with open('word.txt', 'w') as open_file:
    open_file.write(reformat(text))

例如，如果您需要对给定目录中的所有文件执行此操作，请查看 listdir模块os。

python - 如何使用 Python 或任何其他编程/脚本语言格式化文本文件？

1 回答 1

Related

Reference