python - Python collections.Counter: most_common 复杂度

Question

Python中对象most_common提供的函数的复杂度是多少？collections.Counter

更具体地说，是Counter在计数时保留某种排序列表，允许它比何时添加到计数器的（唯一）项目数most_common更快地执行操作？供您参考，我正在处理大量文本数据，试图找到第 n 个最常见的标记。O(n)n

我查看了CPython wiki 上的官方文档和TimeComplexity 文章，但找不到答案。

score 66 · Accepted Answer

从collections.py的源代码中，我们看到如果我们不指定返回元素的数量，则most_common返回一个排序的计数列表。这是一个O(n log n)算法。

如果我们使用most_common返回k > 1元素，那么我们使用heapq.nlargest. 这是一种O(k) + O((n - k) log k) + O(k log k)算法，对于小常数非常有用k，因为它本质上是线性的。O(k)部分来自对初始计数的堆放，k第二部分来自n - k对方法的调用，heappushpop第三部分来自对最终k元素堆的排序。因为k <= n我们可以得出结论，复杂性是：

O(n log k)

如果k = 1那么很容易证明复杂性是：

上）

score 12 · Accepted Answer

源代码准确显示了发生的情况：

def most_common(self, n=None):
    '''List the n most common elements and their counts from the most
    common to the least.  If n is None, then list all element counts.

    >>> Counter('abracadabra').most_common(3)
    [('a', 5), ('r', 2), ('b', 2)]

    '''
    # Emulate Bag.sortedByCount from Smalltalk
    if n is None:
        return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)
    return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))

heapq.nlargest在heapq.py中定义

python - Python collections.Counter: most_common 复杂度

2 回答 2

Related

Reference