python - Python：优雅地将字典与值的 sum() 合并

Question

我正在尝试合并来自多台服务器的日志。每个日志都是一个元组列表 ( date, count)。date可能出现不止一次，我希望生成的字典包含所有服务器的所有计数的总和。

这是我的尝试，例如一些数据：

from collections import defaultdict

a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]

output=defaultdict(int)
for d in input:
        for item in d:
           output[item[0]]+=item[1]
print dict(output)

这使：

{'14.5': 100, '16.5': 100, '13.5': 100, '15.5': 200}

正如预期的那样。

因为看到代码的同事，我要疯了。她坚持认为，必须有一种更加 Pythonic 和优雅的方式来完成它，而无需这些嵌套的 for 循环。有任何想法吗？

score 41 · Accepted Answer

没有比这更简单的了，我认为：

a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]

from collections import Counter

print sum(
    (Counter(dict(x)) for x in input),
    Counter())

请注意，Counter（也称为多重集）是数据最自然的数据结构（一种元素可以多次属于的集合类型，或等效地 - 具有语义 Element -> OccurrenceCount 的映射。您可以在第一位，而不是元组列表。

也可以：

from collections import Counter
from operator import add

print reduce(add, (Counter(dict(x)) for x in input))

使用reduce(add, seq)而不是sum(seq, initialValue)通常更灵活，并允许您跳过传递多余的初始值。

请注意，您还可以使用operator.and_查找多重集的交集而不是总和。

上面的变体非常慢，因为每一步都会创建一个新的 Counter 。让我们解决这个问题。

我们知道Counter+Counter返回一个新Counter的合并数据。这没关系，但我们想避免额外的创建。让我们Counter.update改用：

update(self, iterable=None, **kwds) 未绑定的 collections.Counter 方法

像 dict.update() 但添加计数而不是替换它们。Source 可以是可迭代的、字典或另一个 Counter 实例。

这就是我们想要的。让我们用兼容的函数包装它，reduce看看会发生什么。

def updateInPlace(a,b):
    a.update(b)
    return a

print reduce(updateInPlace, (Counter(dict(x)) for x in input))

这仅比 OP 的解决方案慢一点。

基准测试：http : //ideone.com/7IzSx （更新了另一个解决方案，感谢astynax）

（另外：如果你非常想要一个单线，你可以用相同的方式替换updateInPlace它lambda x,y: x.update(y) or x，甚至被证明更快，但在可读性方面失败。不要:-)）

score 10 · Accepted Answer

from collections import Counter


a = [("13.5",100)]
b = [("14.5",100), ("15.5", 100)]
c = [("15.5",100), ("16.5", 100)]

inp = [dict(x) for x in (a,b,c)]
count = Counter()
for y in inp:
  count += Counter(y)
print(count)

输出：

Counter({'15.5': 200, '14.5': 100, '16.5': 100, '13.5': 100})

编辑： 正如邓肯建议的那样，您可以用一行替换这 3 行：

   count = Counter()
    for y in inp:
      count += Counter(y)

替换为：count = sum((Counter(y) for y in inp), Counter())

score 7 · Accepted Answer

您可以使用 itertools 的groupby：

from itertools import groupby, chain

a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input = sorted(chain(a,b,c), key=lambda x: x[0])

output = {}
for k, g in groupby(input, key=lambda x: x[0]):
  output[k] = sum(x[1] for x in g)

print output

使用 ofgroupby而不是两个循环和 adefaultdict将使您的代码更清晰。

score 1 · Accepted Answer

您可以使用 Counter 或 defaultdict，也可以尝试我的变体：

def merge_with(d1, d2, fn=lambda x, y: x + y):
    res = d1.copy() # "= dict(d1)" for lists of tuples
    for key, val in d2.iteritems(): # ".. in d2" for lists of tuples
        try:
            res[key] = fn(res[key], val)
        except KeyError:
            res[key] = val
    return res

>>> merge_with({'a':1, 'b':2}, {'a':3, 'c':4})
{'a': 4, 'c': 4, 'b': 2}

或者更通用：

def make_merger(fappend=lambda x, y: x + y, fempty=lambda x: x):
    def inner(*dicts):
        res = dict((k, fempty(v)) for k, v
            in dicts[0].iteritems()) # ".. in dicts[0]" for lists of tuples
        for dic in dicts[1:]:
            for key, val in dic.iteritems(): # ".. in dic" for lists of tuples
                try:
                    res[key] = fappend(res[key], val)
                except KeyError:
                    res[key] = fempty(val)
        return res
    return inner

>>> make_merger()({'a':1, 'b':2}, {'a':3, 'c':4})
{'a': 4, 'c': 4, 'b': 2}

>>> appender = make_merger(lambda x, y: x + [y], lambda x: [x])
>>> appender({'a':1, 'b':2}, {'a':3, 'c':4}, {'b':'BBB', 'c':'CCC'})
{'a': [1, 3], 'c': [4, 'CCC'], 'b': [2, 'BBB']}

您也可以子类化dict并实现一个__add__方法：

python - Python：优雅地将字典与值的 sum() 合并

4 回答 4

Related

Reference