python - 为什么二进制堆的这种实现比 Python 的 stdlib 慢？

Question

我一直在实现自己的堆模块来帮助我理解堆数据结构。我了解它们是如何工作和管理的，但是在执行堆排序时，我的实现比标准 python heapq 模块慢得多（对于大小为 100,000 的列表，heapq 需要 0.6 秒，而我的代码需要 2 秒（最初是 2.6 秒，删掉它）通过从 percDown 中取出 len() 语句并传递长度，减少到 2s，因此它不必每次方法调用自身时都计算 len)。这是我的实现：

def percDown(lst, start, end, node):
    #Moves given node down the heap, starting at index start, until the heap property is
    #satisfied (all children must be larger than their parent)
    iChild = 2 * start + 1
    i = start
    # if the node has reached the end of the heap (i.e. no children left),
    # return its index (we are done)
    if iChild > end - 1:
        return start
    #if the second child exists and is smaller than the first child, use that child index
    #for comparing later
    if iChild + 1 < end and lst[iChild + 1] < lst[iChild]:
        iChild += 1
    #if the smallest child is less than the node, it is the new parent
    if lst[iChild] < node:
        #move the child to the parent position
        lst[start] = lst[iChild]
        #continue recursively going through the child nodes of the
        # new parent node to find where node is meant to go
        i = percDown(lst, iChild, end, node)
    return i

popMin：弹出最小值（lst[0]）并重新排序堆

def popMin(lst):
    length = len(lst)
    if (length > 1):
        min = lst[0]
        ele = lst.pop()
        i = percDown(lst, 0, length - 1, ele)
        lst[i] = ele
        return min
    else:
        return lst.pop()

heapify：将列表原地变成堆

def heapify(lst):
    iLastParent = math.floor((len(lst) - 1) / 2)
    length = len(lst)
    while iLastParent >= 0:
        ele = lst[iLastParent]
        i = percDown(lst, iLastParent, length, lst[iLastParent])
        lst[i] = ele
        iLastParent -= 1

排序：使用上述方法对给定列表进行堆排序（非就地）

def sort(lst):
    result = []
    heap.heapify(lst)
    length = len(lst)
    for z in range(0, length):
        result.append(heap.popMin(lst))
    return result

我是否错误地增加了算法/堆创建的复杂性，或者仅仅是 python heapq 模块被高度优化？我有一种感觉是前者，因为 0.6s 与 2s 是一个巨大的差异。

score 6 · Accepted Answer

Pythonheapq模块使用 C 扩展。你无法击败 C 代码。

从heapq模块源代码：

# If available, use C implementation
try:
    from _heapq import *
except ImportError:
    pass

另请参阅_heapqmodule.cC 源代码。

score 2 · Accepted Answer

0.6s 与 2.6s 相差不到 4 倍。是不是“太大了”？

没有足够的信息来回答。算法错误肯定会导致 4 倍的差异……但如果不进行不同尺寸的测试，真的无法判断。

如果你得到，比如说，在 1000 处只有 1.2 倍的差异，在 100000 处有 4 倍的差异，在 1000000 处有 12 倍的差异，那么你的算法复杂性很可能更糟，这意味着你可能确实做错了，这就是你需要的东西修理。

另一方面，如果所有三种尺寸的差异大约为 4 倍，那么您的开销中只有一个更大的常数乘数。很可能是因为您在 Python 中运行了一个内部循环，而（CPython）stdlib 版本正在使用_heapq在 C 中执行相同循环的加速器模块，如Martijn Pieters 的回答中所述。所以，你没有做错什么。您可能可以进行一些微优化，但最终您将不得不用 C 重写代码的核心，或者在 JIT 优化的解释器中运行它，以获得与 stdlib 一样好的任何地方。真的，如果你只是为了理解算法而写这个，你不需要这样做。

作为旁注，您可能想尝试在 PyPy 中运行比较。它的大部分 stdlib 是用纯 Python 编写的，没有特殊的优化，但优化的 JIT 编译器使其几乎与 CPython 中的本机 C 代码一样快。相同的 JIT 将应用于您的代码，这意味着您的未优化代码通常也几乎与 CPython 中的本机 C 代码一样。当然，不能保证这一点，并且它不会改变这样一个事实，即如果您尝试测试算法复杂性，您总是需要以不同的大小进行测试。

python - 为什么二进制堆的这种实现比 Python 的 stdlib 慢？

2 回答 2

Related

Reference