python - How can I find the longest word in a text file?

Question

I have created a function to find the longest word in a text file and finding the longest word(s) in a text file that can be made from 9 letters. I'm new to python and I'm creating a game similar to countdown.

I've created a function to find the longest word in the text file. What I want now is to create python code to find the longest word(s) that can be formed from the 9 letters.

Each letter can only be used once. So from 'qugteroda', I should get rag outed, outraged, out dare, out read, outrage,readout. I'm using python 3.3

my code looks like this:

def Words():
    qfile=open('dict.txt','r')
    long=''
    for line in qfile:
    if len(line)>len(long):
        long=line
    return long

score 5 · Accepted Answer

因此，您想从字典中存在的一组字母中找到最长的排序组合。

为此，您将使用长度等于字符串长度的itertools.combinations() 。您将根据排序字典检查所有这些组合，如果找不到匹配项，请减少组合长度。

您还希望将整个字典加载到集合中以减少搜索时间。我已将一组单词加载到字典中，其中键是排序后的字符串，值是具有相同排序表示的单词列表。

像这样的东西：

import itertools
from collections import defaultdict

words = defaultdict(list)
with open('/usr/share/dict/words') as qfile:
    for word in qfile:
        word = word.rstrip('\n').lower()
        words[''.join(sorted(word))].append(word)

def longest_anagram(term, words):
    search_length = len(term)
    term = sorted(term) # combinations maintains sort order
    while search_length > 0:
        for combo in itertools.combinations(term, search_length):
            search = ''.join(combo) # sort above means we dont need it here
            if search in words:
                return words[search]
        search_length -= 1
    return None

found = longest_anagram('qugteroda', words)
for w in found:
    print(w)

为了完整起见，我应该提到这种方法适用于 18 个字母或更少的搜索字符串。如果您需要从大于 18 个字母的字符串中找到最长的字谜，最好翻转算法，以便按长度将字典单词排序到列表中。然后，您将遍历所有单词并检查它们是否存在于输入搜索字符串中——就像@abarnert 的答案一样。

score 4 · Accepted Answer

您当前的代码返回文本文件中最长的行，句号。

如果您想要最长的行是某个输入字符串的字谜，您需要获取一个输入字符串，并过滤掉不是字谜的行。

由于您指定没有重复字母，因此检查两个单词是否是字谜的最简单方法是检查它们是否都具有相同的字母集。所以：

def Words(inputletters):
    inputletters = set(inputletters)
    qfile=open('dict.txt','r')
    long=''
    for line in qfile:
        if set(line.strip()) == inputletters:
            if len(line)>len(long):
                long=line
    return long

如果您不是在寻找精确匹配，而只是一个子集，只需将替换==为.issubset。

或者，如果“你不能重复字母”实际上是指“你必须在两个字符串中重复完全相同的字母才能算作字谜”，那也很简单：不是比较字母集，而是比较排序的字母列表：

def Words(inputletters):
    inputletters = sorted(inputletters)
    qfile=open('dict.txt','r')
    long=''
    for line in qfile:
        if sorted(line.strip()) == inputletters:
            if len(line)>len(long):
                long=line
    return long

等等。一旦您可以准确定义您要搜索的内容，这可能是对数据结构和/或比较的微不足道的更改。

我不认为这是一个完整的程序，无论你想要什么，但它应该足以（a）让你指向正确的方向，或者（b）让你稍微澄清一下问题更好的。

同时，还有一些其他方面可以改进：

首先，您应该始终关闭您打开的文件（最好使用with语句）。

虽然我们在这里，但通常的 Python 编码标准（在PEP 8中编码）建议使用小写的函数名称。并且long不是一个变量的好名字——虽然它不再是 Python 3.0 的一种类型，但它可能会使自 2.x 以来一直在使用 Python 的读者感到困惑（在这一点上，它仍然是大多数人）。

更有趣的是，就像 Python 中的许多简单for循环一样，您的整个循环可以通过使用迭代器转换调用链来替换。结果通常更简洁、更快、更难出错，并且通常更具可读性。

因此，让我们编写另一个版本来改变所有这些，并检查子集而不是完整集：

def words(inputletters):
    inputletters = set(inputletters)
    with open('dict.txt') as qfile:
        words = map(str.strip, qfile)
        letters = map(set, words)
        matching = filter(inputletters.issubset, letters)
        longest = max(matching, key=len)
        return longest

当然，您可以将其中一些调用合并在一起（或者甚至将整个链变成单行，但我认为这可能会推动可读性的界限），或者将它们重写为生成器表达式（组合得更好——比较(set(line.strip()) for line in qfile)到map(set, map(str.strip, file))或map(lambda line: set(line.strip()), qfile)。

score 0 · Accepted Answer

def longestWord(fileName):
    mx = 0
    op = open(fileName,'r')
    words = op.read().split()
    for i in words:
        if len(i) > mx:
            mx = len(i)
            word = i
            #return the longest word and its length
    return (mx,word)

python - How can I find the longest word in a text file?

3 回答 3

Related

Reference