所以我正在研究Google Python Code Class并尝试做 Word_Count.py 练习。目的是创建一个按字数(值)排序的单词(键)字典,并将它们作为元组返回以供打印。
我创建了一个辅助函数来创建我的字典:
def dict_creator(filename): #helper function to create a dictionary each 'word' is a key and the 'wordcount' is the value
input_file = open(filename, 'r') #open file as read
for line in input_file: #for each line of text in the input file
words = line.split() #split each line into individual words
for word in words: #for each word in the words list(?)
word = word.lower() #make each word lower case.
if word not in word_count: #if the word hasn't been seen before
word_count[word] = 1 #create a dictionary key with the 'word' and assign a value of 1
else: word_count[word] += 1 #if 'word' seen before, increase value by 1
return word_count #return word_count dictionary
word_count.close()
我现在正在使用本文中概述的 .itemgetter 方法创建按值(从大到小)排序的字典:链接。这是我的代码:
def print_words(filename):
word_count = dict_creator(filename) #run dict_creator on input file (creating dictionary)
print sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True)
#print dictionary in total sorted descending by value. Values have been doubled compared to original dictionary?
for word in sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True):
#create sorted list of tuples using operator module functions sorted in an inverse manner
a = word
b = word_count[word]
print a, b #print key and value
但是,当我在测试文件和较小的文件上运行代码时,它会引发一个关键错误(如下所示)。
Traceback (most recent call last):
File "F:\Misc\google-python-exercises\basic\wordcount_edited.py", line 74, in <module>
print_words(lorem_ipsum) #run input file through print_words
File "F:\Misc\google-python-exercises\basic\wordcount_edited.py", line 70, in print_words
b = word_count[word]
KeyError: ('in', 3)
我打印了原始字典和排序字典,当字典排序后,所有值似乎都翻了一番。我查看了与此类问题相关的几个线程并检查了 .itemgetter 文档,但是我似乎找不到其他有类似问题的人。
谁能指出是什么导致我的代码在 word_count 函数中第二次迭代字典,导致值增加?
谢谢!
某人