0

我想知道如何使用 Python 字典对值求和。我逐行读取巨大的文件并为每个特定键增加值。假设我有以下玩具文件:

word1 5
word2 3
word3 1
word1 2
word2 1

我期望的结果是:

my_dict = {'word1':7, 'word2':4, 'word3':1}

下面粘贴的是我目前的工作。

my_dict = {}          
with open('test.txt') as f:
    for line in f:
        line = line.rstrip()
        line = line.split()
        word = line[0]
        frequency = line[1]
        my_dict[word] += int(frequency)
4

3 回答 3

4

使用collections.Counter()对象

from collections import Counter

my_dict = Counter()

with open('test.txt') as f:
    for line in f:
        word, freq = line.split()
        my_dict[word] += int(freq)

请注意,这str.rstrip()不是必需的,str.split()不带参数的调用也会删除字符串。

除了将不存在的键默认为 0 之外,Counter()对象还有其他优点,例如按频率排序的单词(包括前 N 个)、求和和减法。

上面的代码导致:

>>> my_dict
Counter({'word1': 7, 'word2': 4, 'word3': 1})
>>> for word, freq in my_dict.most_common():
...     print word, freq
... 
word1 7
word2 4
word3 1
于 2013-09-07T08:42:29.323 回答
2

您可以使用defaultdict

import collections
d = collections.defaultdict(int)
with open('text.txt') as f:
    for row in f:
        temp = row.split()
        d[temp[0]] += int(temp[1])

d就是现在:

defaultdict(<type 'int'>, {'word1': 7, 'word3': 1, 'word2': 4})
于 2013-09-07T08:43:17.480 回答
0

如果有人正在处理多个列(在我的情况下,我遇到了同样的问题,但有 4 个列):

这应该可以解决问题:

from collections import defaultdict

my_dict = defaultdict(int)

with open("input") as f:
    for line in f:
        if line.strip():
            items = line.split()
            freq = items[-1]
            lemma = tuple(items[:-1]) 

            my_dict[lemma] += int(freq)

for items, freq in my_dict.items():
    print items, freq
于 2013-11-07T15:38:01.013 回答