在浏览标准库中鲜为人知的部分时,我偶然发现了std::sort_heap
. 但我不明白为什么它存在,因为有一个名为std::sort
.
另请注意,复杂性是相同的。
所以,我的问题是:存在的理由是什么sort_heap
?
sort_heap
假设输入已经是堆的形式。这意味着它理论上可以比 更有效地工作std::sort
,因为输入的顺序有一些限制(不像std::sort
,它必须适用于所有输入)。
正如评论中提到的,值得注意的是,这些性能优势并没有得到保证,而且显然取决于输入数据,所以如果性能很重要,那么就没有办法绕过分析。
In the case where the data already has the heap property, there's an obvious sorting algorithm that doesn't apply to data without the property -- repeatedly remove the maximum element of the heap and restore the heap property. This is how heapsort works (first heapify the data, then use the heap property to sort it).
So, suppose that you have a heap and you want it sorted. You could call std::sort
, but std::sort_heap
exists to hint that this algorithm be used[*]. It makes at least some sense to provide the programmer with a means to potentially improve the sort performance. Whether it's actually faster or not is another matter.
Observe that std:sort
is permitted to be implemented as a heapsort, although I doubt that it ever is.
The world would go on if sort_heap
were not available, since there's another way to get the same behavior: repeatedly call pop_heap
on a smaller and smaller initial segment of your original heap. So if it troubles you, view it as a pure convenience function. It's possible there are optimizations than can be applied, though, to make sort_heap
a little better than this.
A historical note that might have affected the thinking of the authors of C++03: in the SGI version of the STL, sort
was defined to use introsort and partial_sort
was defined to use heapsort. I don't think that's exactly the rationale for including it in the standard, though: it's also an "obvious" function to include with the heap algorithms.
[*] it's a pretty strong hint, since the complexity requirement for sort_heap
is "at most N log N comparisons", not "O(N log N) comparisons". So an implementation can't have sort_heap
call sort
unless it knows that its own sort
implementation performs at most that many comparisons when the input data has the heap property.
复杂性保证实际上并不相同。
std::sort 需要 O(log N) 堆栈上的内存量。std::sort_heap 需要 O(1) 数量的堆栈。这在堆栈空间受限的环境中产生了很大的不同,例如在嵌入式应用程序中(即在微控制器上运行)。即使在几千个元素数组上调用 std::sort 也会导致堆栈溢出。
顺便说一句,在嵌入式环境中,内部存储通常是 SRAM,因此您不必担心快速排序/引入排序获得性能优势的缓存位置。
因此,在微控制器环境中,建议编写
std::make_heap(data.begin(), data.end());
std::sort_heap(data.begin(), data.end());
代替
std::sort(data.begin(), data.end());
代码大小是使用堆排序的一个很好的理由。这些是模板函数;对于被排序和比较函数的每种类型组合,您将获得一个完整的排序实现std::sort
(即,用于排序一个案例的代码的任何部分都不会与用于排序不同案例的代码共享 - 即使它是相同的类型,但是不同的比较)。
堆排序(即std::make_heap
后跟std::sort_heap
)也是如此——但生成的代码量可能要少得多,尤其是在比较运算符不是完全微不足道的情况下;我刚刚做了一些测试,std::sort
在 x86 上,我看到 2k-3k 字节的 a 和 600-1000 字节的堆排序相同的操作。
因此,如果您倾向于在不同类型上使用大量排序操作,和/或使用不同的比较函数,那么将堆排序用于倾向于在较小 N 上操作的那些可能是一个好主意;对于这些,算法效率的差异不会对其造成太大影响,并且整体代码膨胀将减少。
我怀疑与 相比,堆实现往往会在给定问题上做更多的“交换” std::sort
,所以如果你对交换成本更高的类型进行排序,它可能会明显变慢 - 对于这些情况,也许可以排序而是一个指针数组。
取自:http ://www.sgi.com/tech/stl/sort_heap.html
sort_heap 将堆 [1] [first, last) 转换为已排序的范围。请注意,这不是稳定的 > 排序:不保证保留等效元素的相对顺序。
std::sort 可能会根据实现在最坏的情况下为您提供 O(N^2) 复杂度,并适用于未排序的数据集。std::sort_heap 在堆上工作并且总是给你 O(nlogn)