python - Python - 将 127,000 多个单词导入列表，但函数仅返回部分结果

Question

此函数旨在将从字典文件导入的所有 127,000 多个单词与用户输入的长度进行比较。然后它应该返回等于该长度的单词数量。它确实在一定程度上做到了这一点。

如果我输入“15”，它会返回“0”。如果我输入“4”，它会返回“3078”。

我很肯定有些单词的长度为 15 个字符，但无论如何它都会返回“0”。我还应该提到，如果我输入任何大于 15 的值，当有大于 15 的单词时，结果仍然是 0。

try:
    dictionary = open("dictionary.txt")
except:
    print("Dictionary not found")
    exit()


def reduceDict():
    first_list = []

    for line in dictionary:
       line = line.rstrip()
       if len(line) == word_length:
           for letter in line:
               if len([ln for ln in line if line.count(ln) > 1]) == 0:
                   if first_list.count(line) < 1:
                       first_list.append(line)
               else:
                    continue
    if showTotal == 'y':
       print('|| The possible words remaing are: ||\n ',len(first_list))

score 2 · Accepted Answer

我的阅读：

if len([ln for ln in line if line.count(ln) > 1]) == 0:

是有问题的单词不能有任何重复的字母，这可以解释为什么没有找到单词——一旦你达到 15 个，重复的字母就很常见了。由于解释中没有提到这个要求，如果我们放弃，那么我们可以写：

def reduceDict(word_length, showTotal):
    first_list = []

    for line in dictionary:
        line = line.rstrip()

        if len(line) == word_length:
            if line not in first_list:
                first_list.append(line)

    if showTotal:
        print('The number of words of length {} is {}'.format(word_length, len(first_list)))
        print(first_list)

try:
    dictionary = open("dictionary.txt")
except FileNotFoundError:
    exit("Dictionary not found")

reduceDict(15, True)

从我的 Unixwords文件中出现了大约 40 个单词。如果我们想放回唯一字母要求：

import re

def reduceDict(word_length, showTotal):
    first_list = []

    for line in dictionary:
        line = line.rstrip()

        if len(line) == word_length and not re.search(r"(.).*\1", line):
            if line not in first_list:
                first_list.append(line)

    if showTotal:
        print('The number of words of length {} is {}'.format(word_length, len(first_list)))
        print(first_list)

正如人们所期望的那样，它开始返回大约 13 个字母的 0 结果。

score 0 · Accepted Answer

在您的代码中，您不需要这一行 -

for letter in line:

在您的列表理解中，如果您的意图是遍历line使用中的所有单词 -

if len([ln for ln in line.split() if line.count(ln) > 1]) == 0:

在您编码列表理解中的循环遍历每个字符并检查该字符是否在line. 这样，如果您的文件包含chemotherapeutic它，它将不会被添加到列表中first_list，因为有多次出现的字母。因此，除非您的文件包含超过 14 个字母且所有字母仅出现一次的单词，否则您的代码将无法找到它们。

python - Python - 将 127,000 多个单词导入列表，但函数仅返回部分结果

2 回答 2

Related

Reference