python - 如何将字典“从里到外”

Question

免责声明：我刚刚开始学习 Python

我有一个函数可以计算单词在文本文件中出现的次数，并将单词设置为键，将计数设置为值，并将其存储在字典“book_index”中。这是我的代码：

alice = open('location of the file', 'r', encoding = "cp1252")

def book_index(alice):
    """Alice is a file reference"""
    """Alice is opened, nothing else is done"""
    worddict = {}
    line = 0

    for ln in alice:
        words = ln.split()
        for wd in words:
            if wd not in worddict:
                worddict[wd] = 1 #if wd is not in worddict, increase the count for that word to 1
            else:
                worddict[wd] = worddict[wd] + 1 #if wd IS in worddict, increase the count for that word BY 1
        line = line + 1
    return(worddict)

我需要将字典“从里到外”翻过来，并使用计数作为键，并将出现 x 次的任何单词作为值。例如： [2, 'hello', 'hi'] 其中 'hello' 和 'hi' 在文本文件中出现两次。

我是否需要循环浏览我现有的字典或再次循环浏览文本文件？

score 4 · Accepted Answer

由于字典是值映射的键，因此您无法有效地按值进行过滤。因此，您必须遍历字典中的所有元素以获取具有特定值的键。

这将打印出字典d中值等于的所有键searchValue：

for k, v in d.items():
    if v == searchValue:
        print(k)

关于您的book_index功能，请注意您可以使用内置的Counter来计算事物。Counter 本质上是一个字典，它以计数作为其值并自动处理不存在的键。使用计数器，您的代码将如下所示：

from collections import Counter
def book_index(alice):
    worddict = Counter()
    for ln in alice:
        worddict.update(ln.split())
    return worddict

或者，正如 roippi 作为对另一个答案的评论所建议的那样，只是worddict = Counter(word for line in alice for word in line.split()).

score 3 · Accepted Answer

我个人建议在这里使用专为此类应用程序设计的 Counter 对象。例如：

from collections import Counter
counter = Counter()
for ln in alice:
    counter.update(ln.split())

这将为您提供相关的字典，如果您随后阅读Counter 文档

您可以只检索最常见的结果。

在您提出的问题中，这可能不适用于每种情况，但它甚至比第一次手动迭代要好一些。

如果你真的想“翻转”这本字典，你可以按照以下方式做一些事情：

matching_values = lambda value: (word for word, freq in wordict.items() if freq==value)
{value: matching_values for value in set(worddict.values())}

上述解决方案与其他解决方案相比具有一些优势，因为延迟执行意味着对于非常稀疏的情况，您不希望对该函数进行大量调用，或者只是发现哪个值实际上具有相应的条目，这会更快因为它实际上不会遍历字典。

也就是说，这个解决方案通常会比普通迭代解决方案更糟糕，因为它每次需要一个新数字时都会主动迭代字典。

没有根本不同，但我不想在这里复制其他答案。

score 2 · Accepted Answer

循环浏览您现有的字典，这是一个使用示例dict.setdefault()：

countdict = {}
for k, v in worddict.items():
    countdict.setdefault(v, []).append(k)

或与collections.defaultdict：

import collections
countdict = collections.defaultdict(list)
for k, v in worddict.items():
    countdict[v].append(k)

我个人更喜欢这种setdefault()方法，因为结果是一个普通的字典。

例子：

>>> worddict = {"hello": 2, "hi": 2, "world": 4}
>>> countdict = {}
>>> for k, v in worddict.items():
...     countdict.setdefault(v, []).append(k)
...
>>> countdict
{2: ['hi', 'hello'], 4: ['world']}

正如在其他一些答案中所指出的，您可以book_index通过使用collections.Counter.

score 1 · Accepted Answer

没有重复：

word_by_count_dict = {value: key for key, value in worddict.iteritems()}

请参阅 PEP 274 以了解 Python 的字典理解：http: //www.python.org/dev/peps/pep-0274/

有重复项：

import collections

words_by_count_dict = collections.defaultdict(list)
for key, value in worddict.iteritems():
    words_by_count_dict[value].append(key)

这边走：

words_by_count_dict[2] = ["hello", "hi"]

python - 如何将字典“从里到外”

4 回答 4

Related

Reference