python - 文本文件中的单词列表

Question

我需要从文本文件创建一个单词列表。该列表将用于刽子手代码中，需要从列表中排除以下内容：

重复的单词
少于5个字母的单词
包含 'xx' 作为子字符串的单词
包含大写字母的单词

然后需要将单词列表输出到文件中，以便每个单词都出现在自己的行中。程序还需要输出最终列表中的单词数。

这就是我所拥有的，但它无法正常工作。

def MakeWordList():
    infile=open(('possible.rtf'),'r')
    whole = infile.readlines()
    infile.close()

    L=[]
    for line in whole:
        word= line.split(' ')
        if word not in L:
            L.append(word)
            if len(word) in range(5,100):
                L.append(word)
                if not word.endswith('xx'):
                    L.append(word)
                    if word == word.lower():
                        L.append(word)
    print L

MakeWordList()

score 2 · Accepted Answer

你用这段代码多次附加这个词，
你实际上根本没有过滤掉这些词，只是根据它们通过的数量来添加不同的计时次数if。

你应该结合所有if的：

if word not in L and len(word) >= 5 and not 'xx' in word and word.islower():
    L.append(word)

或者，如果您希望它更具可读性，您可以拆分它们：

    if word not in L and len(word) >= 5:
        if not 'xx' in word and word.islower():
            L.append(word)

但不要在每一个之后附加。

score 0 · Accepted Answer

想一想：在你的嵌套 if 语句中，任何不在列表中的单词都会在你的第一行出现。然后，如果它是 5 个或更多字符，它将被再次添加（我打赌），并且再次等等。您需要重新考虑 if 语句中的逻辑。

score 0 · Accepted Answer

改进的代码：

def MakeWordList():
    with open('possible.rtf','r') as f:
        data = f.read()
    return set([word for word in data if len(word) >= 5 and word.islower() and not 'xx' in word])

set(_iterable_)返回一个没有重复的集合类型对象（所有set项目必须是唯一的）。[word for word...]是一种列表推导式，它是创建简单列表的一种更短的方式。您可以遍历“数据”中的每个单词（假设每个单词都在单独的行上）。if len(word) >= 5 and word.islower() and not 'xx' in word完成最后三个要求（必须超过5个字母，只有小写字母，不能包含'xx'）。

python - 文本文件中的单词列表

3 回答 3

Related

Reference