python - 在 for 循环中使用 string.punctuation 替换字母

Question

简而言之，我试图用空格替换行内单词中的任何标点符号。

例如，一旦处理，文本文档输出将没有像这样的标点符号。

喵喵喵！我说我taw了一个腻子tat。我做到了，我做到了，我做到了，我做了一个腻子 tat Shsssssssssh 我正在狩猎 wabbits 嘿嘿嘿嘿嘿现在是狩猎 wabbits 的好日子嘿嘿嘿停止它的 wabbit 狩猎季节 Huntin Wabbits 最终指南 101 种 kook wabbit 的方法

没有改变它看起来像这样。

来自 question5.txt 的文本

喵喵喵！我说我taw了一个腻子tat。我做到了！我做到了！我确实涂了一个腻子。Shsssssssssh ...我在打猎鹬。嘿嘿嘿嘿嘿……今天是猎兔兔的好日子！... 嘿嘿嘿... 停止 - 这是 wabbit Huntin 季节！Huntin Wabbits：101 种 kook wabbit 的终极指南。

这是一个练习，所以我被告知使用 .replace 和 for 循环。

import string
infile = open('question5.txt', 'r')

lines = infile.readlines()
lines = str(lines)
for words in lines:
    for letters in words:
        letters.replace(string.punctuation,' ')
        print(letters)

任何解决问题的帮助将不胜感激。

请注意，在您的建议和一些研究之后，如果有人关注结果，我会在更多小时后结束。谢谢大家波

import string
infile = open('question5.txt', 'r')
lines = infile.readlines()

def word_count(list):
    count = 0
    list = str(list)
    for lines in list:
        list = list.replace('.',' ')
        list = list.replace(',',' ')
        list = list.replace('-',' ')

    split = list.split()
    print (split)
    for words in split:
        count = count + 1
    return count


for line in lines:
    count = word_count(line)
    print(count)
infile.close()

score 3 · Accepted Answer

这个更好：

import string as st

trans = st.maketrans(st.punctuation, ' '*len(st.punctuation))
with open('question5.txt', 'r') as f:
    for line in f:
        print line.translate(trans)

score 2 · Accepted Answer

我不能 100% 确定，因为您的示例输出仍然包含一些标点符号 - 也许是错字？

在 Python 2.x 中，您可以尝试以下操作，因为它实际上并没有显示您正在用空格替换，而不仅仅是删除标点符号。

from string import punctuation
with open('question5.txt') as fin:
    test = fin.read()

new_text = test.translate(None, punctuation)

或者，使用正则表达式：

import re
new_text = re.sub('[' + re.escape(punctuation) + ']+', '', test)

仅使用循环的示例：

new_string = ''
for ch in old_string:
    if ch not in punctuation:
        new_string += ch

这可以通过放入punctuation一个集合（或使用上述方法）来提高效率

score 1 · Accepted Answer

首先，正如 elyase所示，您应该使用该with构造，或者您应该在最后关闭文件。此外，正如他所展示的，在读取文本文件并动态处理它时，您永远不应该使用.readlines(). 只需循环遍历文件对象的内容。它逐行迭代（包括结尾\n）。

另一个问题是lines = str(lines). 实际上，您lines最初是一个字符串列表。将其str转换为看起来像的单个字符串"['Meep...', 'wabits...', 'huntin...']"。您首先遍历该字符串——获取单个字符（作为单字符字符串）。命名它words不会改变现实。（如果你真的想把这些词去掉，你应该使用类似的东西for word in line.split():。）

然后，您将通过单个字符进行第二次循环——再次获取单个字符（即循环仅转动一次并且不添加任何功能）。

接下来，.replace() 返回替换的结果，但它不修改参数。您想将结果分配给某个变量。无论如何，您不能将string.punctuation用作要替换的旧字符串，因为它永远不会在源文本中找到。蛮力解决方案必须遍历标点字符字符串并替换单个字符。

总而言之，letters仍然包含单个字符——没有替换。然后打印单个字符。该print函数添加换行符。通过这种方式，您可以看到呈现为以中文方式编写的字符串/行列表的字符串表示形式的原始内容——单列自上而下。

最后，the string.punctuation只是一个字符串常量。

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

您可以通过不导入string模块来简化您的代码（如果您没有说要这样做的话），并使用您自己的字符串文字和应该被视为标点符号的字符。

python - 在 for 循环中使用 string.punctuation 替换字母

3 回答 3

Related

Reference