python - Python：Collections.Counter 与 defaultdict(int)

Question

假设我有一些如下所示的数据。

Lucy = 1
Bob = 5
Jim = 40
Susan = 6
Lucy = 2
Bob = 30
Harold = 6

我想结合：

删除重复的键，以及
添加这些重复键的值。

这意味着我会得到键/值：

Lucy = 3
Bob = 35
Jim = 40
Susan = 6
Harold = 6

为此使用（来自集合）计数器或默认字典会更好吗？

score 68 · Accepted Answer

两者Counter和defaultdict(int)在这里都可以正常工作，但它们之间几乎没有区别：

Counter支持您可以在multiset上执行的大多数操作。因此，如果您想使用这些操作，请选择 Counter。
Counter当您查询缺少的键时，不会向字典添加新键。因此，如果您的查询包含字典中可能不存在的键，那么最好使用Counter.

例子：

>>> c = Counter()
>>> d = defaultdict(int)
>>> c[0], d[1]
(0, 0)
>>> c
Counter()
>>> d
defaultdict(<type 'int'>, {1: 0})

例子：

Counter还有一个名为的方法most_common，允许您按项目的数量对项目进行排序。要获得相同的东西，defaultdict您必须使用sorted.

例子：

>>> c = Counter('aaaaaaaaabbbbbbbcc')
>>> c.most_common()
[('a', 9), ('b', 7), ('c', 2)]
>>> c.most_common(2)          #return 2 most common items and their counts
[('a', 9), ('b', 7)]

Counter还允许您从 Counter 对象创建元素列表。

例子：

>>> c = Counter({'a':5, 'b':3})
>>> list(c.elements())
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b']

因此，根据您想要对结果 dict 执行的操作，您可以在和之间进行Counter选择defaultdict(int)。

score 4 · Accepted Answer

defaultdict(int)似乎工作得更快。

In [1]: from collections import Counter, defaultdict

In [2]: def test_counter():
   ...:     c = Counter()
   ...:     for i in range(10000):
   ...:         c[i] += 1
   ...:

In [3]: def test_defaultdict():
   ...:     d = defaultdict(int)
   ...:     for i in range(10000):
   ...:         d[i] += 1
   ...:

In [4]: %timeit test_counter()
5.28 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %timeit test_defaultdict()
2.31 ms ± 68.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

score 3 · Accepted Answer

我支持使用defaultdict(int)求和计数，例如在这种情况下，以及Counter()计数列表元素。在您的情况下，以下将是最干净的解决方案：

name_count = [
    ("Lucy", 1),
    ("Bob", 5),
    ("Jim", 40),
    ("Susan", 6),
    ("Lucy", 2),
    ("Bob", 30),
    ("Harold", 6)
]

aggregate_counts = defaultdict(int)
for name, count in name_count:
    aggregate_counts[name] += count

python - Python：Collections.Counter 与 defaultdict(int)

3 回答 3

Related

Reference