5

我试图重新实现 heapify 方法,以便使用_siftup_siftdown更新或删除堆中的任何节点并保持 O(log(n)) 的时间复杂度。

我做了一些努力来优化我的代码,但事实证明它们比heapq.heapify (就所花费的总时间而言)更糟糕。所以我决定研究源代码。并将复制的代码与模块的代码进行比较。

# heap invariant.
def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if newitem < parent:
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)


def heapify(x):
    """Transform list into a heap, in-place, in O(len(x)) time."""
    n = len(x)
    # Transform bottom-up.  The largest index there's any point to looking at
    # is the largest with a child index in-range, so must have 2*i + 1 < n,
    # or i < (n-1)/2.  If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
    # j-1 is the largest, which is n//2 - 1.  If n is odd = 2*j+1, this is
    # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
    for i in reversed(range(n//2)):
        _siftup(x, i)

a = list(reversed(range(1000000)))
b = a.copy()

import heapq

import time

cp1 = time.time()
heapq.heapify(a)
cp2 = time.time()
heapify(b)
cp3 = time.time()

print(a == b)
print(cp3 - cp2, cp2 - cp1)


而且我发现总是cp3 - cp2>=cp2 - cp1并且不一样, 即使两者都相同,heapify也需要更多的时间。heapq.heaify 在某些情况下heapify需要 3 秒和heapq.heapify0.1 秒

heapq.heapfy 模块的执行速度比相同的 heapify 更快,它们仅通过导入有所不同。

请告诉我原因,如果我犯了一些愚蠢的错误,我很抱歉。

4

1 回答 1

4

heapifyfrom模块实际上是heapq一个内置函数:

>>> import heapq
>>> heapq
<module 'heapq' from 'python3.9/heapq.py'>
>>> heapq.heapify
<built-in function heapify>

help(heapq.heapify)说:

heapify关于模块中内置函数的帮助_heapq...

所以它实际上是在导入内置模块_heapq,因此运行的是C代码,而不是 Python。

如果您heapq.py进一步滚动代码,您将看到

# If available, use C implementation
try:
    from _heapq import *
except ImportError:
    pass

这将覆盖函数,就像heapify他们的 C 实现一样。例如,_heapq.heapify这里

于 2021-03-02T15:33:00.487 回答