python - .itemgetter 的奇怪输出，用于按值 python 进行列表排序

Question

所以我正在研究Google Python Code Class并尝试做 Word_Count.py 练习。目的是创建一个按字数（值）排序的单词（键）字典，并将它们作为元组返回以供打印。

我创建了一个辅助函数来创建我的字典：

def dict_creator(filename): #helper function to create a dictionary each 'word' is a key and the 'wordcount' is the value
            input_file = open(filename, 'r') #open file as read
            for line in input_file: #for each line of text in the input file
                    words = line.split() #split each line into individual words
                    for word in words: #for each word in the words list(?)
                            word = word.lower() #make each word lower case.
                            if word not in word_count: #if the word hasn't been seen before
                                    word_count[word] = 1 #create a dictionary key with the 'word' and assign a value of 1
                            else: word_count[word] += 1 #if 'word' seen before, increase value by 1
            return word_count #return word_count dictionary
            word_count.close()

我现在正在使用本文中概述的 .itemgetter 方法创建按值（从大到小）排序的字典：链接。这是我的代码：

def print_words(filename):
        word_count = dict_creator(filename) #run dict_creator on input file (creating dictionary)
        print sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True)
        #print dictionary in total sorted descending by value. Values have been doubled compared to original dictionary?
        for word in sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True):
                #create sorted list of tuples using operator module functions sorted in an inverse manner
                a = word
                b = word_count[word]
                print a, b #print key and value

但是，当我在测试文件和较小的文件上运行代码时，它会引发一个关键错误（如下所示）。

Traceback (most recent call last):
  File "F:\Misc\google-python-exercises\basic\wordcount_edited.py", line 74, in <module>
    print_words(lorem_ipsum) #run input file through print_words
  File "F:\Misc\google-python-exercises\basic\wordcount_edited.py", line 70, in print_words
    b = word_count[word]
KeyError: ('in', 3)

我打印了原始字典和排序字典，当字典排序后，所有值似乎都翻了一番。我查看了与此类问题相关的几个线程并检查了 .itemgetter 文档，但是我似乎找不到其他有类似问题的人。

谁能指出是什么导致我的代码在 word_count 函数中第二次迭代字典，导致值增加？

谢谢！

某人

score 1 · Accepted Answer

(1) 你实际上并没有定义word_countin dict_creator。我期待看到

word_count = {}

在开始时。这意味着无论word_count它发生什么变化都是在其他地方和全局定义的，所以每当你调用dict_creator它时，它都会添加到同一个word_count字典中，增加值。word_count至少从您显示的代码中，您只有一个。

（2）关于KeyError：

   for word in sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True):
            #create sorted list of tuples using operator module functions sorted in an inverse manner
            a = word
            b = word_count[word]

iteritems()返回元组，所以word已经是('dict_creator', 1). 您可以简单地按原样打印它。调用word_count[word]尝试使用元组(key, value)作为键。IOW，即使你已经调用了变量词，它实际上是word_and_count, with word, count = word_and_count。

(3) 在这部分：

        return word_count #return word_count dictionary
        word_count.close()

我认为您的意思是input_file.close()，但是在您返回“之后”关闭文件是没有意义的，因为该行不会被执行。另一种选择是使用with成语：

with open(filename) as input_file:
    code_goes_here = True
return word_count

此处文件将自动关闭。

进行上述更改后，您的代码似乎对我有用。

python - .itemgetter 的奇怪输出，用于按值 python 进行列表排序

1 回答 1

Related

Reference