1

我试图让我的程序报告文本文件中出现最多的单词。例如,如果我输入“你好,我喜欢馅饼,因为它们太棒了”,程序应该打印出“喜欢的次数最多”。执行选项 3 时出现此错误:KeyError: 'h'

#Prompt the user to enter a block of text.
done = False
textInput = ""
while(done == False):
    nextInput= input()
    if nextInput== "EOF":
        break
    else:
        textInput += nextInput

#Prompt the user to select an option from the Text Analyzer Menu.
print("Welcome to the Text Analyzer Menu! Select an option by typing a number"
    "\n1. shortest word"
    "\n2. longest word"
    "\n3. most common word"
    "\n4. left-column secret message!"
    "\n5. fifth-words secret message!"
    "\n6. word count"
    "\n7. quit")

#Set option to 0.
option = 0

#Use the 'while' to keep looping until the user types in Option 7.
while option !=7:
    option = int(input())

#The error occurs in this specific section of the code.
#If the user selects Option 3,
    elif option == 3:
        word_counter = {}
        for word in textInput:
            if word in textInput:
                word_counter[word] += 1
            else:
                word_counter[word] = 1

        print("The word that showed up the most was: ", word)
4

5 回答 5

2

我想你可能想做:

for word in textInput.split():
  ...

目前,您只是在遍历textInput. 因此,要遍历每个单词,我们必须首先将字符串拆分为单词数组。默认情况下.split()会在空格上拆分,但您可以通过将分隔符传递给split().


此外,您需要检查该单词是否在您的字典中,而不是在您的原始字符串中。所以试试:

if word in word_counter:
  ...

然后,找到出现次数最多的条目:

highest_word = ""
highest_value = 0

for k,v in word_counter.items():
  if v > highest_value:
    highest_value = v
    highest_word = k

然后,只需打印出highest_wordand的值highest_value


要跟踪关系,只需保留最高单词的列表。如果我们发现更高的出现率,请清除列表并继续重建。这是到目前为止的完整程序:

textInput = "He likes eating because he likes eating"
word_counter = {}
for word in textInput.split():
  if word in word_counter:
    word_counter[word] += 1
  else:
    word_counter[word] = 1


highest_words = []
highest_value = 0

for k,v in word_counter.items():
  # if we find a new value, create a new list,
  # add the entry and update the highest value
  if v > highest_value:
    highest_words = []
    highest_words.append(k)
    highest_value = v
  # else if the value is the same, add it
  elif v == highest_value:
    highest_words.append(k)

# print out the highest words
for word in highest_words:
  print word
于 2013-07-14T23:58:57.607 回答
2

与其滚动您自己的计数器,不如在集合模块中使用计数器。

>>> input = 'blah and stuff and things and stuff'
>>> from collections import Counter
>>> c = Counter(input.split())
>>> c.most_common()
[('and', 3), ('stuff', 2), ('things', 1), ('blah', 1)]

另外,作为一般的代码风格,请避免添加这样的注释:

#Set option to 0.
option = 0

它使您的代码可读性降低,而不是更多。

于 2013-07-15T01:40:02.557 回答
1

最初的答案当然是正确的,但您可能要记住,它不会向您显示“优先领带”。像这样的一句话

A life in the present is a present itself.

只会显示“a”或“present”是排名第一的热门歌曲。事实上,由于字典(通常)是无序的,因此您看到的结果甚至可能不是重复多次的第一个单词。

如果您需要报告倍数,我可以建议以下内容:

1) 使用您当前的键值对方法来获取 'word':'hits'。
2) 确定“命中”的最大值。
3)检查等于最大命中数的值的数量,并将这些键添加到列表中。
4) 遍历列表以显示命中次数最多的单词。

标准杆示例:

greatestNumber = 0
# establish the highest number for wordCounter.values()
for hits in wordCounter.values():
    if hits > greatestNumber:
        greatestNumber = hits

topWords = []
#find the keys that are paired to that value and add them to a list
#we COULD just print them as we iterate, but I would argue that this
#makes this function do too much
for word in wordCounter.keys():
    if wordCounter[word] == greatestNumber:
        topWords.append(word)

#now reveal the results
print "The words that showed up the most, with %d hits:" % greatestNumber
for word in topWords:
    print word

根据 Python 2.7 或 Python 3,您的里程(和语法)可能会有所不同。但理想情况下-恕我直言-您首先要确定最大的点击数,然后返回并将相关条目添加到新列表中。

编辑 - 您可能应该按照不同答案中的建议使用 Counters 模块。我什至不知道这是 Python 刚准备做的事情。哈哈不要接受我的回答,除非你一定要自己写计数器!似乎已经有一个模块。

于 2013-07-15T00:47:49.520 回答
0

使用 Python 3.6+,您可以使用statistics.mode

>>> from statistics import mode
>>> mode('Hello I like pie because they are like so good'.split())
'like'
于 2017-05-14T18:20:49.463 回答
-1

I'm not too keen on Python, but on your last print statement, shouldn't you have a %s?

i.e.: print("The word that showed up the most was: %s", word)

于 2013-07-15T00:08:34.803 回答