python - 尝试打开并创建一个新文件，然后将所有 4 个字母单词更改为“xxxxxx”

Question

可能重复：
在python中替换四个字母的单词

对于作业，我需要打开一个文件，将所有四个字母的单词替换为“xxxxxx”，然后将文本写入一个新文件。

这是给出的原始文件中的文件文本：

The 3 lines in this file end with the new line character.

There is a blank line above this line.

这是我到目前为止所拥有的：

def censor(filename):

    infile=open(filename,"r")

    content=infile.read()infile.close()

    outfile = open("censored.txt","w")

    content=content.replace("this","xxxxxx")

    content=content.replace("file","xxxxxx")

    content=content.replace("with","xxxxxx")

    content=content.replace("line","xxxxxx")

    outfile.write(content)

    outfile.close()

这是结果：

The 3 xxxxxxs in xxxxxx xxxxxx end xxxxxx the new xxxxxx character.

There is a blank xxxxxx above xxxxxx xxxxxx.

我很难让“line”而不是“lines”改变，因为此刻“lines”正在变为“xxxxxxs”。

有谁知道这样做的特定方法？是否需要 if 语句？

score 1 · Accepted Answer

这会让你开始。这未经测试，但它应该解决单词有标点符号的情况。它也更健壮，因为它迭代容纳长度超过 3 行的文件，并且它还删除任何 4 个字母的单词，而不仅仅是你知道的那些。

def censor(filename):
   infile = open(filename,"r")
   outfile = open("censored.txt","w")
   for line in infile:
       wordArr=line.split(" ");
       for word in wordArr:
           word = ''.join(c for c in string if c.isalnum())
           if len(word)==4:
               line=line.replace(word,"XXXXX")
       outfile.write(line)
   outfile.close()
   infile.close()

其他人建议使用正则表达式，但我不得不说这个问题很容易解决，所以正则表达式增加了相当多的复杂性。特别是对于编程的新手。然而，正则表达式可能非常有用且学习起来非常强大。

score 0 · Accepted Answer

首先，导入重新：

import re

然后用 xxxx 替换所有 4 个连续的非空格字符串：

content = re.sub(r"(\b)\w{4}(\b)", r"\1xxxxx\2", content)

在 REPL 中测试它：

>>> import re
>>> re.sub(r"(\b)\w{4}(\b)", r"\1xxxxx\2", "Thes 3 lines in this file end with the new line character.")
'xxxxx 3 lines in xxxxx file end xxxxx the new xxxxx character.'

score 0 · Accepted Answer

content=content.replace("\\bline\\b","xxxxxx")

但是很可能您的老师不希望您对片段中的每个四个字母单词进行硬编码 - 他可能希望该代码适用于任何文本文件中的所有四个字母单词。这将需要一种不同的方法，我鼓励您考虑（-=

python - 尝试打开并创建一个新文件，然后将所有 4 个字母单词更改为“xxxxxx”

3 回答 3

Related

Reference