python - 我的两个文本分析功能遇到问题

Question

我在尝试查找语音文本文件（实际上是 3 个文件）中唯一单词的数量时遇到了麻烦，我只想给你我的完整代码，以免造成误解。

#This program will serve to analyze text files for the number of words in
#the text file, number of characters, sentances, unique words, and the longest
#word in the text file. This program will also provide the frequency of unique
#words. In particular, the text will be three political speeches which we will
#analyze, building on searching techniques in Python.

def main():
    harper = readFile("Harper's Speech.txt")
    newWords = cleanUpWords(harper)
    print(numCharacters(harper), "Characters.")
    print(numSentances(harper), "Sentances.")
    print(numWords(newWords), "Words.")
    print(uniqueWords(newWords), "Unique Words.")
    print("The longest word is: ", longestWord(newWords))
    obama1 = readFile("Obama's 2009 Speech.txt")
    newWords = cleanUpWords(obama1)
    print(numCharacters(obama1), "Characters.")
    print(numSentances(obama1), "Sentances.")
    print(numWords(obama1), "Words.")
    print(uniqueWords(newWords), "Unique Words.")
    print("The longest word is: ", longestWord(newWords))
    obama2 = readFile("Obama's 2008 Speech.txt")
    newWords = cleanUpWords(obama2)
    print(numCharacters(obama2), "Characters.")
    print(numSentances(obama2), "Sentances.")
    print(numWords(obama2), "Words.")
    print(uniqueWords(newWords), "Unique Words.")
    print("The longest word is: ", longestWord(newWords))

def readFile(filename):
    '''Function that reads a text file, then prints the name of file without
'.txt'. The fuction returns the read file for main() to call, and print's
the file's name so the user knows which file is read'''
    inFile1 = open(filename, "r")
    fileContentsList = inFile1.read()
    inFile1.close()
    print("\n", filename.replace(".txt", "") + ":")
    return fileContentsList

def numCharacters(file):
    '''Fucntion returns the length of the READ file (not readlines because it
would only read the amount of lines and counting characters would be wrong),
which will be the correct amount of total characters in the text file.'''
    return len(file)

def numSentances(file):
    '''Function returns the occurances of a period, exclamation point, or
a question mark, thus counting the amount of full sentances in the text file.'''
    return file.count(".") + file.count("!") + file.count("?")

def cleanUpWords(file):
        words = (file.replace("-", " ").replace("  ", " ").replace("\n", " "))
        onlyAlpha = ""
        for i in words:
            if i.isalpha() or i == " ":
                onlyAlpha += i
        return onlyAlpha.replace("  ", " ")

def numWords(newWords):
    '''Function finds the amount of words in the text file by returning
the length of the cleaned up version of words from cleanUpWords().'''
    return len(newWords.split())

def uniqueWords(newWords):
    unique = sorted(newWords.split())
    unique = set(unique)
    return str(len(unique))

def longestWord(file):
    max(file.split())

main()

所以，我的最后两个函数 uniqueWords 和longestWord 将无法正常工作，或者至少我的输出是错误的。对于独特的单词，我应该得到 527，但出于某种奇怪的原因，我实际上得到了 567。此外，无论我做什么，我最长的单词功能总是不打印。我尝试了很多方法来获得最长的单词，以上只是其中一种方法，但都没有返回。请帮助我完成两个悲伤的功能！

score 0 · Accepted Answer

尝试这样做：

def longestWord(file):
    return sorted(file.split(), key = len)[-1]

或者它会更容易做到uniqueWords

def uniqueWords(newWords):
    unique = set(newWords.split())
    return (str(len(unique)),max(unique, key=len))

info = uniqueWords("My name is Harper")
print("Unique words" + info[0])
print("Longest word" + info[1])

sorted而且您之前不需要set获取所有唯一的单词，因为设置它是Unordered collections of unique elements

看看cleanUpWords。因为如果你会有这样的字符串Hello I'm Harper. Harper I am.

清理后你会得到 6 个独特的词，因为你会有 word Im。

python - 我的两个文本分析功能遇到问题

1 回答 1

Related

Reference