c++ - Introsort（快速排序+堆排序）实现和复杂性

Question

我读过 C++ 对其内置的 std::sort 使用 introsort（内省排序），它从快速排序开始，当你达到深度限制时切换到堆排序。

我还读到深度限制应该是 2*log(2,N)。

这个值纯粹是实验性的吗？或者它背后有什么数学理论？

score 2 · Accepted Answer

If you have an interval (range or array), the number of times you'll have to split the interval in half before you end up with an empty (or one element) interval is log(2,N), that's just a mathematical fact, you can work it out easily, if you want. If all goes perfectly well with quicksort, it should recurse log(2,N) times, for the same reason (and at each recursion level, it has to process all values of the interval, which leads to a O(N*log(2,N)) complexity for the overall algorithm). The problem is that quicksort could require many more recursions (if it keeps getting "unlucky" with picking pivot values, which means that it doesn't split the interval in half, but in an imbalanced way instead). At worse, quicksort could end up recursing N times, which is definitely not acceptable for a production-quality implementation.

Switching to heap-sort at 2*log(2,N) is just a good heuristic in general, to detect a much too deep number of recursions.

Technically, you could base this on the empirical performance of heap-sort versus quick-sort, to figure out what limit is the best. But such tests are highly dependent on the application (what are you sorting? how are you comparing elements? how cheap are the element swaps? etc..). So, most one-size-fits-all implementation, like std::sort, would just pick a reasonable limit like 2*log(2,N).

score 0 · Accepted Answer

@Mikael Persson 关于深度限制为何为 2*log(2,N) 所说的部分正确。这不仅仅是一个很好的启发式方法，或者一个合理的限制。

事实上，正如您可能已经猜到的（从您的第二个问题中描述），这有一个重要的数学原因：在波浪符号 (搜索波浪符号)中，快速排序平均进行~2*log(2,N)比较。在大哦符号中，这相当于O(N*log(2,N))。

这就是为什么当递归深度超过 2*log(2,N) 时 introsort 切换到堆排序（具有渐近 O(N*log(2,N)) 复杂度）。您可以将其视为通常不会发生的事情，并且很可能意味着仅枢轴选择和快速排序出现问题会导致 O(N^2) 复杂性。

您可以在此处找到快速排序平均比较次数的简短数学证明（幻灯片 21）。

c++ - Introsort（快速排序+堆排序）实现和复杂性

2 回答 2

Related

Reference