algorithm - 堆排序的下限？

Question

众所周知，堆排序的最坏情况运行时是 Ω(n lg n)，但我无法理解为什么会这样。特别是，堆排序的第一步（创建一个最大堆）需要时间 Θ(n)。然后是 n 次堆删除。我明白为什么每次堆删除都需要时间 O(lg n); 重新平衡堆涉及一个消泡操作，该操作在堆的高度上花费时间 O(h)，并且 h = O(lg n)。但是，我不明白为什么第二步应该采用 Ω(n lg n)。似乎任何单独的堆出队都不一定会导致移动到顶部的节点一直沿树冒泡。

我的问题是 - 有没有人知道堆排序的最佳情况行为的良好下限证明？

score 19 · Accepted Answer

所以我做了一些挖掘自己，看起来这个结果实际上是最近的！我能找到的第一个下界证明是从 1992 年开始的，尽管堆排序本身是在 1964 年发明的。

正式的下界证明归功于 Schaffer 和 Sedgewick 的“堆排序分析”论文。这是一个略为解释的证明版本，省略了一些技术细节。

首先，假设对于某个 k，n = 2 ^k - 1，这保证了我们有一个完整的二叉堆。稍后我将展示如何单独处理这种情况。因为我们有 2 ^k - 1 个元素，所以堆排序的第一遍将在 Θ(n) 中建立一个高度为 k 的堆。现在，考虑这个堆的前半部分出队，它删除了 2 ^k-1堆中的节点。第一个关键观察结果是，如果您使用起始堆，然后在此处标记实际上最终出队的所有节点，它们将形成堆的子树（即，每个出队的节点都有一个也出队的父节点）。您可以看到这一点，因为如果不是这种情况，那么尽管节点本身已出列，但某些节点的（较大的）父节点并未出列，这意味着值是无序的。

现在，考虑这棵树的节点是如何分布在堆中的。如果您将堆的级别标记为 0、1、2、...、k - 1，那么在级别 0、1、2、...、k - 2 中会有一些这些节点（即除了树的底层之外的所有内容）。为了让这些节点从堆中出列，它们必须向上交换到根节点，并且一次只能向上交换一层。这意味着降低堆排序运行时间的一种方法是计算将所有这些值带到根所需的交换次数。事实上，这正是我们要做的。

我们需要回答的第一个问题是——最大的 2 ^k-1个节点中有多少不在堆的底层？我们可以通过矛盾证明这不大于 2 ^{k-2 。}假设在堆的底层至少有 2 ^{k-2 + 1 个最大节点。}那么这些节点的每个父节点也必须是级别 k - 2 中的大节点。即使在最好的情况下，这意味着级别 k - 2 中必须至少有 2 ^k-3 + 1 个大节点，这意味着在第 k - 3 层中至少有 2 ^k-4 + 1 个大节点，等等。总结所有这些节点，我们得到有 2 ^k-2 + 2 ^k-3 + 2 ^k-4 + ... + 2⁰ + k 个大节点。但是这个值严格大于 2 ^k-1，这与我们在这里只使用 2 ^k-1个节点的事实相矛盾。

好的……我们现在知道底层最多有 2 ^{k-2 个大节点。}这意味着在前 k-2 层中必须至少有 2 ^{k-2个大节点。}我们现在要问——在所有这些节点上，从那个节点到根的距离的总和是多少？好吧，如果我们有 2 ^k-2个节点位于完整堆的某个位置，那么其中最多 2 ^k-3个节点可以位于前 k - 3 个级别，因此至少有 2 ^k-2 - 2 ^{k- 3} = 2 ^k-3级 k - 2 重节点。因此，需要执行的交换总数至少为 (k - 2) 2 ^k-3。由于 n = 2 ^k-1，k = Θ(lg n)，所以这个值是 Θ(n lg n) 需要的。

score 3 · Accepted Answer

简单的观察答案是这样的：堆中的项目是：

1
2
4
8
...
2^[log(n/4)]
and last level has between (1..2^[log(n/2)]) ==> (1,[n/2]) item, (by [] I mean Ceiling not roof)

例如，如果您有 7 个项目：

1
2
4

如果你有 8 个项目：

有2个不同的堆树，首先至少n / 4 - 1个堆的项目在最后一层，所以n/4 - 1在最后一层之前至少有一个项目，在第一种情况下需要O((n/4 - 1) * log(n/2))删除最后一层项目从堆中，在第二种情况下，它需要O((n/4 - 1) * log(n/4))从前一个级别中删除项目。所以在这两种情况下，它只需要 Ω(n log(n)) 用于 n/4 - 1 个项目，所以它是一个下限（很容易说它是严格的下限）。

score 1 · Accepted Answer

Here is a solution that uses CLRS terms:
We start with a max-heap that is a complete binary tree with n elements.
We can say that in a complete binary there are n/2 leaves and n/2 inner nodes.
n/2 iterations of HEAP-SORT remove the largest n/2 elements from the heap.
Let S be the set of the largest n/2 elements.
There can be at most n/4 elements from S in the leaves since there must be additional n/4 of them in the inner nodes.
Let L be these n/4 largest elements from S that are in the leaves.
So if there are n/4 elements from S at level 0 (the leaves level) then there must be at least n/8 of them at level 1.
Let P be these n/8 elements from S that are at level 1.
n/2 iterations of HEAP-SORT may give the elements from L a short cut to the root and then out of the heap, but the elements from P must make all the way to the root before they are removed from the heap.
So there are at least (n/8)(lgn-1) operations, which gives us a running time of Ω(nlgn).
Now for the case of a max-heap that doesn't have all its leaves at level 0.
Let k be the number of its leaves at level 0.
After k iterations of HEAP-SORT, we are left with a max-heap that is a complete binary tree with height lgn-1.
We can continue our proof the same way.
Now for the case when there are less than n/4 leaves from S.
Let k be the number of elements from S that are in the leaves at level 0.
If k <= n/8 then there must be at least n/8 elements from S at level 1.
This is because there can be a total of n/4 elements above level 1.
We continue the proof the same way.
If k>n/8 then there must be at least n/16 elements from S that are at level 1.
We continue the proof the same way.
We conclude that the running time of HEAP-SORT is Ω(nlgn).

algorithm - 堆排序的下限？

3 回答 3

Related