c++ - sort_heap 存在的原因

Question

在浏览标准库中鲜为人知的部分时，我偶然发现了std::sort_heap. 但我不明白为什么它存在，因为有一个名为std::sort.

另请注意，复杂性是相同的。

所以，我的问题是：存在的理由是什么sort_heap？

score 12 · Accepted Answer

sort_heap假设输入已经是堆的形式。这意味着它理论上可以比更有效地工作std::sort，因为输入的顺序有一些限制（不像std::sort，它必须适用于所有输入）。

正如评论中提到的，值得注意的是，这些性能优势并没有得到保证，而且显然取决于输入数据，所以如果性能很重要，那么就没有办法绕过分析。

score 4 · Accepted Answer

In the case where the data already has the heap property, there's an obvious sorting algorithm that doesn't apply to data without the property -- repeatedly remove the maximum element of the heap and restore the heap property. This is how heapsort works (first heapify the data, then use the heap property to sort it).

So, suppose that you have a heap and you want it sorted. You could call std::sort, but std::sort_heap exists to hint that this algorithm be used[*]. It makes at least some sense to provide the programmer with a means to potentially improve the sort performance. Whether it's actually faster or not is another matter.

Observe that std:sort is permitted to be implemented as a heapsort, although I doubt that it ever is.

The world would go on if sort_heap were not available, since there's another way to get the same behavior: repeatedly call pop_heap on a smaller and smaller initial segment of your original heap. So if it troubles you, view it as a pure convenience function. It's possible there are optimizations than can be applied, though, to make sort_heap a little better than this.

A historical note that might have affected the thinking of the authors of C++03: in the SGI version of the STL, sort was defined to use introsort and partial_sort was defined to use heapsort. I don't think that's exactly the rationale for including it in the standard, though: it's also an "obvious" function to include with the heap algorithms.

[*] it's a pretty strong hint, since the complexity requirement for sort_heap is "at most N log N comparisons", not "O(N log N) comparisons". So an implementation can't have sort_heap call sort unless it knows that its own sort implementation performs at most that many comparisons when the input data has the heap property.

score 3 · Accepted Answer

复杂性保证实际上并不相同。

std::sort 需要 O(log N) 堆栈上的内存量。std::sort_heap 需要 O(1) 数量的堆栈。这在堆栈空间受限的环境中产生了很大的不同，例如在嵌入式应用程序中（即在微控制器上运行）。即使在几千个元素数组上调用 std::sort 也会导致堆栈溢出。

顺便说一句，在嵌入式环境中，内部存储通常是 SRAM，因此您不必担心快速排序/引入排序获得性能优势的缓存位置。

因此，在微控制器环境中，建议编写

std::make_heap(data.begin(), data.end());
std::sort_heap(data.begin(), data.end());

代替

std::sort(data.begin(), data.end());

score 2 · Accepted Answer

代码大小是使用堆排序的一个很好的理由。这些是模板函数；对于被排序和比较函数的每种类型组合，您将获得一个完整的排序实现std::sort（即，用于排序一个案例的代码的任何部分都不会与用于排序不同案例的代码共享 - 即使它是相同的类型，但是不同的比较）。

堆排序（即std::make_heap后跟std::sort_heap）也是如此——但生成的代码量可能要少得多，尤其是在比较运算符不是完全微不足道的情况下；我刚刚做了一些测试，std::sort在 x86 上，我看到 2k-3k 字节的 a 和 600-1000 字节的堆排序相同的操作。

因此，如果您倾向于在不同类型上使用大量排序操作，和/或使用不同的比较函数，那么将堆排序用于倾向于在较小 N 上操作的那些可能是一个好主意；对于这些，算法效率的差异不会对其造成太大影响，并且整体代码膨胀将减少。

我怀疑与相比，堆实现往往会在给定问题上做更多的“交换” std::sort，所以如果你对交换成本更高的类型进行排序，它可能会明显变慢 - 对于这些情况，也许可以排序而是一个指针数组。

score 0 · Accepted Answer

取自：http ://www.sgi.com/tech/stl/sort_heap.html

sort_heap 将堆 [1] [first, last) 转换为已排序的范围。请注意，这不是稳定的 > 排序：不保证保留等效元素的相对顺序。

std::sort 可能会根据实现在最坏的情况下为您提供 O(N^2) 复杂度，并适用于未排序的数据集。std::sort_heap 在堆上工作并且总是给你 O(nlogn)

c++ - sort_heap 存在的原因

5 回答 5

Related

Reference