python - 堆队列有什么用？

Question

阅读 Guido对使用 Python 在 2MB RAM 中排序一百万个 32 位整数的问题的臭名昭著的答案，我发现了模块heapq。

我还发现我不了解杰克，也不知道我能用它做什么。

你能向我解释一下（众所周知的 6 岁目标）堆队列算法是做什么用的，你能用它做什么？

您能否提供一个简单的Python 片段，在其中使用它（与heapq模块一起）解决的问题将用它而不是其他东西更好地解决？

score 10 · Accepted Answer

heapq实现二进制堆，这是一种部分排序的数据结构。特别是，它们具有三个有趣的操作：

heapify在 O( n ) 时间内将列表原地转换为堆；
heappush在 O(lg n ) 时间内向堆中添加一个元素；
heappop在 O(lg n ) 时间内从堆中检索最小元素。

许多有趣的算法依赖于堆来提高性能。最简单的可能是部分排序：获取列表的k个最小（或最大）元素，而不对整个列表进行排序。heapq.nsmallest( nlargest) 这样做。的实现nlargest可以解释为：

def nlargest(n, l):
    # make a heap of the first n elements
    heap = l[:n]
    heapify(heap)

    # loop over the other len(l)-n elements of l
    for i in xrange(n, len(l)):
        # push the current element onto the heap, so its size becomes n+1
        heappush(heap, l[i])
        # pop the smallest element off, so that the heap will contain
        # the largest n elements of l seen so far
        heappop(heap)

    return sorted(heap, reverse=True)

分析：设 N 为中的元素个数l。heapify运行一次，成本为 O(n)；这可以忽略不计。然后，在一个运行 Nn = O(N) 次的循环中，我们以 O(lg n) 的成本执行 aheappop和 a heappush，总运行时间为 O(N lg n)。当 N >> n 时，与其他明显的算法相比，这是一个巨大的胜利sorted(l)[:n]，后者需要 O(N lg N) 时间。

score 2 · Accepted Answer

例如：你有一组 1000 个浮点数。您想从集合中重复删除最小的项目并将其替换为 0 到 1 之间的随机数。最快的方法是使用 heapq 模块：

heap = [0.0] * 1000
# heapify(heap)   # usually you need this, but not if the list is initially sorted
while True:
    x = heappop(heap)
    heappush(head, random.random())

每次迭代所花费的时间是堆长度的对数（即大约 7 个单位，对于长度为 1000 的列表）。其他解决方案需要线性时间（即大约 1000 个单位，慢 140 倍，并且随着长度的增加变得越来越慢）：

lst = [0.0] * 1000
while True:
    x = min(lst)    # linear
    lst.remove(x)   # linear
    lst.append(random.random())

或者：

lst = [0.0] * 1000
while True:
    x = lst.pop()   # get the largest one in this example
    lst.append(random.random())
    lst.sort()      # linear (in this case)

甚至：

lst = [0.0] * 1000
while True:
    x = lst.pop()   # get the largest one in this example
    bisect.insort(lst, random.random())   # linear

python - 堆队列有什么用？

2 回答 2

Related

Reference