python - Python 2.7 - 使用字典从文本文件中查找和替换到新的文本文件

Question

我是编程新手，过去几个月一直在业余时间学习python。我决定尝试创建一个小脚本，将美国拼写转换为文本文件中的英语拼写。

在过去的 5 个小时里，我一直在尝试各种各样的事情，但最终想出了一些让我更接近目标的东西，但还没有完全实现！

#imported dictionary contains 1800 english:american spelling key:value pairs. 
from english_american_dictionary import dict


def replace_all(text, dict):
    for english, american in dict.iteritems():
        text = text.replace(american, english)
    return text


my_text = open('test_file.txt', 'r')

for line in my_text:
    new_line = replace_all(line, dict)
    output = open('output_test_file.txt', 'a')
    print >> output, new_line

output.close()

我确信有一种更好的方法来处理事情，但是对于这个脚本，这是我遇到的问题：

在输出文件中，每行都写在隔行上，中间有一个换行符，但原始的 test_file.txt 没有这个。本页底部显示的 test_file.txt 的内容
只有一行中美式拼写的第一个实例被转换为英语。
我真的不想以附加模式打开输出文件，但无法在此代码结构中找出“r”。

任何帮助感谢这个热切的新手！

test_file.txt 的内容是：

I am sample file.
I contain an english spelling: colour.
3 american spellings on 1 line: color, analyze, utilize.
1 american spelling on 1 line: familiarize.

score 8 · Accepted Answer

您看到的额外空行是因为您正在使用print写出已经在末尾包含换行符的行。由于print也编写了自己的换行符，因此您的输出变为双倍行距。一个简单的解决方法是outfile.write(new_line)改用它。

至于文件模式，问题在于您一遍又一遍地打开输出文件。你应该在开始时打开它一次。with使用语句来处理打开的文件通常是一个好主意，因为它们会在您完成它们后为您关闭它们。

我不明白你的其他问题，只有一些替换发生。您的字典是否缺少'analyze'and的拼写'utilize'？

我提出的一个建议是不要逐行替换。您可以一次读取整个文件，file.read()然后将其作为一个单元进行处理。这可能会更快，因为它不需要在拼写字典中的项目上循环（只需一次，而不是每行一次）：

with open('test_file.txt', 'r') as in_file:
    text = in_file.read()

with open('output_test_file.txt', 'w') as out_file:
    out_file.write(replace_all(text, spelling_dict))

编辑：

为了使您的代码正确处理包含其他单词的单词（例如包含“tire”的“整个”），您可能需要放弃简单的str.replace方法以支持正则表达式。

这是一个使用的快速组合解决方案re.sub，给定从美国英语到英国英语的拼写变化字典（即，按照当前字典的相反顺序）：

import re

#from english_american_dictionary import ame_to_bre_spellings
ame_to_bre_spellings = {'tire':'tyre', 'color':'colour', 'utilize':'utilise'}

def replacer_factory(spelling_dict):
    def replacer(match):
        word = match.group()
        return spelling_dict.get(word, word)
    return replacer

def ame_to_bre(text):
    pattern = r'\b\w+\b'  # this pattern matches whole words only
    replacer = replacer_factory(ame_to_bre_spellings)
    return re.sub(pattern, replacer, text)

def main():
    #with open('test_file.txt') as in_file:
    #    text = in_file.read()
    text = 'foo color, entire, utilize'

    #with open('output_test_file.txt', 'w') as out_file:
    #    out_file.write(ame_to_bre(text))
    print(ame_to_bre(text))

if __name__ == '__main__':
    main()

replacer_factory这种代码结构的一个好处是，如果您将字典以其他顺序传递给函数，您可以轻松地将英式英语拼写转换回美式英语拼写。

score 3 · Accepted Answer

该print语句添加了自己的换行符，但您的行已经有了自己的换行符。您可以从中删除换行符new_line，也可以使用较低级别的

output.write(new_line)

相反（它准确地写出你传递给它的内容）。

对于你的第二个问题，我认为我们需要一个实际的例子。 replace()确实应该替换所有事件。

>>> "abc abc abcd ab".replace("abc", "def")
'def def defd ab'

我不确定你的第三个问题在问什么。如果要替换输出文件，请执行

output = open('output_test_file.txt', 'w')

'w'表示您正在打开文件进行写入。

score 2 · Accepted Answer

正如上面所有好的答案一样，我写了一个我认为更 Pythonic 的新版本，希望这会有所帮助：

# imported dictionary contains 1800 english:american spelling key:value pairs.
mydict = {
    'color': 'colour',
}


def replace_all(text, mydict):
    for english, american in mydict.iteritems():
        text = text.replace(american, english)
    return text

try:
    with open('new_output.txt', 'w') as new_file:
        with open('test_file.txt', 'r') as f:
            for line in f:
                new_line = replace_all(line, mydict)
                new_file.write(new_line)
except:
    print "Can't open file!"

您还可以看到我之前提出的答案，它包含许多最佳实践建议： Loading large file (25k entries) into dict is slow in Python?

以下是关于如何编写 python 更多 python 的一些其他提示：) http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html

祝你好运：）

python - Python 2.7 - 使用字典从文本文件中查找和替换到新的文本文件

3 回答 3

Related

Reference