2

我有一个包含多行文本的 txt 文件 (myText.txt)。

我想知道 :

  • 如何创建需要删除的单词列表(我想自己设置单词)
  • 如何创建需要替换的单词列表

例如,如果 myText.txt 是:

    The ancient Romans influenced countries and civilizations in the following centuries.  
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month. 
  • 我想删除“the”“and”“in” 我想用“old”替换“ancient”
  • 我想用“年”代替“月”和“世纪”
4

3 回答 3

3

你总是可以使用正则表达式:

import re

st='''\
The ancient Romans influenced countries and civilizations in the following centuries.  
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.'''

deletions=('and','in','the')
repl={"ancient": "old", "month":"years", "centuries":"years"}

tgt='|'.join(r'\b{}\b'.format(e) for e in deletions)
st=re.sub(tgt,'',st)
for word in repl:
    tgt=r'\b{}\b'.format(word)
    st=re.sub(tgt,repl[word],st)


print st
于 2013-08-20T03:38:09.497 回答
2

这应该可以解决问题。您使用列表来存储要删除的对象,然后遍历列表并从内容字符串中删除列表中的每个元素。然后,您使用字典来存储您现在拥有的单词以及要替换它们的单词。您还可以遍历这些并用替换词替换当前词。

def replace():
    contents = ""
    deleteWords = ["the ", "and ", "in "]
    replaceWords = {"ancient": "old", "month":"years", "centuries":"years"}

    with open("meText.txt") as f:
    contents = f.read()
    for word in deleteWords:
    contents = contents.replace(word,"")

    for key, value in replaceWords.iteritems():
    contents = contents.replace(key, value)
    return contents
于 2013-08-20T03:08:40.577 回答
2

使用列表进行删除,使用字典进行替换。它应该看起来像这样:

 def processTextFile(filename_in, filename_out, delWords, repWords):


    with open(filename_in, "r") as sourcefile:
        for line in sourcefile:
            for item in delWords:
                line = line.replace(item, "")
            for key,value in repWords.items():
                line = line.replace(key,value)

            with open(filename_out, "a") as outfile:
                outfile.write(line)



if __name__ == "__main__":
    delWords = []
    repWords = {}

    delWords.extend(["the ", "and ", "in "])
    repWords["ancient"] = "old"
    repWords["month"] = "years"
    repWords["centuries"] = "years"

    processTextFile("myText.txt", "myOutText.txt", delWords, repWords)

请注意,这是为 Python 3.3.2 编写的,这就是我使用 items() 的原因。如果使用 Python 2.x,请使用 iteritems(),因为我认为它更有效,尤其是对于大型文本文件。

于 2013-08-20T03:47:39.387 回答