1

所以我正在研究Google Python Code Class并尝试做 Word_Count.py 练习。目的是创建一个按字数(值)排序的单词(键)字典,并将它们作为元组返回以供打印。

我创建了一个辅助函数来创建我的字典:

def dict_creator(filename): #helper function to create a dictionary each 'word' is a key and the 'wordcount' is the value
            input_file = open(filename, 'r') #open file as read
            for line in input_file: #for each line of text in the input file
                    words = line.split() #split each line into individual words
                    for word in words: #for each word in the words list(?)
                            word = word.lower() #make each word lower case.
                            if word not in word_count: #if the word hasn't been seen before
                                    word_count[word] = 1 #create a dictionary key with the 'word' and assign a value of 1
                            else: word_count[word] += 1 #if 'word' seen before, increase value by 1
            return word_count #return word_count dictionary
            word_count.close()

我现在正在使用本文中概述的 .itemgetter 方法创建按值(从大到小)排序的字典:链接。这是我的代码:

def print_words(filename):
        word_count = dict_creator(filename) #run dict_creator on input file (creating dictionary)
        print sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True)
        #print dictionary in total sorted descending by value. Values have been doubled compared to original dictionary?
        for word in sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True):
                #create sorted list of tuples using operator module functions sorted in an inverse manner
                a = word
                b = word_count[word]
                print a, b #print key and value

但是,当我在测试文件和较小的文件上运行代码时,它会引发一个关键错误(如下所示)。

Traceback (most recent call last):
  File "F:\Misc\google-python-exercises\basic\wordcount_edited.py", line 74, in <module>
    print_words(lorem_ipsum) #run input file through print_words
  File "F:\Misc\google-python-exercises\basic\wordcount_edited.py", line 70, in print_words
    b = word_count[word]
KeyError: ('in', 3)

我打印了原始字典和排序字典,当字典排序后,所有值似乎都翻了一番。我查看了与此类问题相关的几个线程并检查了 .itemgetter 文档,但是我似乎找不到其他有类似问题的人。

谁能指出是什么导致我的代码在 word_count 函数中第二次迭代字典,导致值增加?

谢谢!

某人

4

1 回答 1

1

(1) 你实际上并没有定义word_countin dict_creator。我期待看到

word_count = {}

在开始时。这意味着无论word_count它发生什么变化都是在其他地方和全局定义的,所以每当你调用dict_creator它时,它都会添加到同一个word_count字典中,增加值。word_count至少从您显示的代码中,您只有一个。

(2)关于KeyError:

   for word in sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True):
            #create sorted list of tuples using operator module functions sorted in an inverse manner
            a = word
            b = word_count[word]

iteritems()返回元组,所以word已经是('dict_creator', 1). 您可以简单地按原样打印它。调用word_count[word]尝试使用元组(key, value)作为键。IOW,即使你已经调用了变量词,它实际上是word_and_count, with word, count = word_and_count

(3) 在这部分:

        return word_count #return word_count dictionary
        word_count.close()

我认为您的意思是input_file.close(),但是在您返回“之后”关闭文件是没有意义的,因为该行不会被执行。另一种选择是使用with成语:

with open(filename) as input_file:
    code_goes_here = True
return word_count

此处文件将自动关闭。

进行上述更改后,您的代码似乎对我有用。

于 2013-01-29T11:19:51.450 回答