c++ - 带有“三的中位数”枢轴选择的快速排序：了解过程

Question

我们正在课堂上介绍快速排序（使用数组）。我一直在努力思考他们希望我们的快速排序分配如何与“三的中位数”枢轴选择方法一起工作。我只需要一个高层次的解释它是如何工作的。我们的文字没有帮助，我很难用谷歌搜索找到一个明确的解释。

到目前为止，这是我认为可以理解的：

“三的中位数”函数采用index 0(first)、array_end_index(last) 和(index 0 + array_end_index)/2(middle) 中的元素。计算具有这 3 个中值的指数。返回对应的索引。

功能参数如下：

/* @param left
*       the left boundary for the subarray from which to find a pivot
* @param right
*       the right boundary for the subarray from which to find a pivot
* @return
*       the index of the pivot (middle index); -1 if provided with invalid input
*/
int QS::medianOfThree(int left, int right){}

然后，在“分区”函数中，索引与“三的中位数”函数返回的数字匹配的数字作为枢轴。我的作业指出，为了继续对数组进行分区，枢轴必须位于左右边界之间。问题是，我们的“三的中位数”函数返回三个索引之一：第一个、中间或最后一个索引。这三个指数中只有一个（中间）可以“介于”任何东西之间。

功能参数如下：

/* @param left
*       the left boundary for the subarray to partition
* @param right
*       the right boundary for the subarray to partition
* @param pivotIndex
*       the index of the pivot in the subarray
* @return
*       the pivot's ending index after the partition completes; -1 if
*       provided with bad input
*/
int QS::partition(int left, int right, int pivotIndex){}

我有什么误解？

以下是函数的完整描述：

/*
* sortAll()
*
* Sorts elements of the array.  After this function is called, every
* element in the array is less than or equal its successor.
*
* Does nothing if the array is empty.
*/
void QS::sortAll(){}

/*
* medianOfThree()
*
* The median of three pivot selection has two parts:
*
* 1) Calculates the middle index by averaging the given left and right indices:
*
* middle = (left + right)/2
*
* 2) Then bubble-sorts the values at the left, middle, and right indices.
*
* After this method is called, data[left] <= data[middle] <= data[right].
* The middle index will be returned.
*
* Returns -1 if the array is empty, if either of the given integers
* is out of bounds, or if the left index is not less than the right
* index.
*
* @param left
*       the left boundary for the subarray from which to find a pivot
* @param right
*       the right boundary for the subarray from which to find a pivot
* @return
*       the index of the pivot (middle index); -1 if provided with invalid input
*/
int QS::medianOfThree(int left, int right){}

/*
* Partitions a subarray around a pivot value selected according to
* median-of-three pivot selection.
*
* The values which are smaller than the pivot should be placed to the left
* of the pivot; the values which are larger than the pivot should be placed
* to the right of the pivot.
*
* Returns -1 if the array is null, if either of the given integers is out of
* bounds, or if the first integer is not less than the second integer, OR IF THE
* PIVOT IS NOT BETWEEN THE TWO BOUNDARIES.
*
* @param left
*       the left boundary for the subarray to partition
* @param right
*       the right boundary for the subarray to partition
* @param pivotIndex
*       the index of the pivot in the subarray
* @return
*       the pivot's ending index after the partition completes; -1 if
*       provided with bad input
*/
int QS::partition(int left, int right, int pivotIndex){}

score 4 · Accepted Answer

首先了解快速排序，然后是三个中位数。

要执行快速排序，您：

从您正在排序的数组中选择一个项目（任何项目都可以，但这是最好的，我们会回来讨论）。
对数组重新排序，使所有小于您选择的项在数组中位于它之前，所有大于它的项在它之后。
递归地对您选择的项目之前和之后的集合执行上述操作。

第 2 步称为“分区操作”。考虑一下您是否有以下情况：

3 2 8 4 1 9 5 7 6

现在假设您选择了这些数字中的第一个作为您的枢轴元素（我们在步骤 1 中选择的那个）。在我们应用第 2 步之后，我们最终会得到类似的结果：

2 1 3 4 8 9 5 7 6

该值3现在位于正确的位置，并且每个元素都位于其正确的一侧。如果我们现在对左侧进行排序，我们最终得到：

1 2 3 4 8 9 5 7 6.

现在，让我们只考虑它右侧的元素：

4 8 9 5 7 6.

如果我们选择4下一个旋转，我们最终什么都不会改变，因为它一开始就处于正确的位置。它左边的元素集是空的，所以这里什么也不做。我们现在需要对集合进行排序：

8 9 5 7 6.

如果我们使用 8 作为我们的支点，我们最终会得到：

5 7 6 8 9.

9右边的now 只有一个元素，所以显然已经排序了。5 7 6留待排序。如果我们以为中心，5我们最终会不理会它，我们只需要排序7 6到6 7.

现在，考虑到更广泛背景下的所有这些变化，我们最终得到的是：

1 2 3 4 5 6 7 8 9.

所以再次总结一下，快速排序选择一个项目，移动它周围的元素，以便它们都相对于该项目正确定位，然后对剩余的两个集合递归地做同样的事情，直到没有未排序的块留下，一切都是排序。

让我们回到我说“任何项目都可以”时在那边捏造的事情。虽然确实任何物品都可以，但您选择的物品会影响性能。如果幸运的话，您最终会在与 n log n 成比例的操作中执行此操作，其中 n 是元素的数量。如果您足够幸运，它将是一个稍大的数字，仍然与 n log n 成正比。如果你真的不走运，那将是一个与正比于 n ²的数成正比的数。

那么哪个是最好的选择呢？最好的数字是完成分区操作后将在中间结束的项目。但是我们不知道那是什么项目，因为要找到中间的项目，我们必须对所有项目进行排序，这就是我们最初尝试做的事情。

因此，我们可以采取一些策略：

去第一个吧，因为，嗯，为什么不呢？
选择中间那个，因为可能由于某种原因数组已经排序或几乎排序，如果没有，它不会比任何其他选择更糟糕。
随机选择一个。
选择第一个、中间一个和最后一个，然后选择这三个中的中间值，因为它至少是这三个选项中最好的。
选择数组前三分之一的中位数，第二个三分之一的中位数，最后三分之一的中位数，然后最后选择这三个中位数的中位数。

这些有不同的优点和缺点，但总的来说，这些选项中的每一个在选择最佳支点方面都比前一个提供了更好的结果，但代价是花费更多的时间和精力来选择那个支点。（作为某种 DoS 攻击的一部分，Random 的另一个优势是可以击败有人故意尝试创建您会遇到更糟糕情况的数据的情况）。

我的作业指出，为了继续对数组进行分区，枢轴必须位于左右边界之间。

是的。3当我们已经排序到正确的位置并排序左侧时，再次考虑上面的情况：

1 2 3 4 8 9 5 7 6.

现在，我们需要对 range 进行排序4 8 9 5 7 6。边界是数组3和数组末尾之间的4线6（或者另一种看待它的方式，边界是 4 和 6，但它是包含这些项目的包容性边界）。因此，我们选择的三个是4(first) the 6(last) 和 the9或 the ，5这取决于我们在将计数除以 2 时是向上还是向下舍入（我们可能会向下舍入，因为这在整数除法中很常见，所以 the 9）。所有这些都在我们当前正在排序的分区的边界内。因此，我们的三中位数是6（或者如果我们确实四舍五入，我们会选择5）。

（顺便说一句，一个总是选择最佳枢轴的神奇完美枢轴选择器只会选择6或第三个更糟糕的选择，或者甚至是 3 个相同元素中的任意选择，所有这些都是最糟糕的。与其他方法相比，发生这种情况的可能性要小得多）。76

score 1 · Accepted Answer

计算“三的中位数”是一种在数组中获取伪中位数元素并使该索引等于您的分区的方法。这是一种粗略估计数组中位数的简单方法，从而提高性能。

为什么这会有用？因为理论上，你希望这个分区值成为你数组的真正中位数，所以当你对这个数组进行快速排序时，枢轴会平分这个数组，并启用快速排序的 O(NlogN) 排序时间给你。

示例：您的数组是：

[5,3,1,7,9]

三个的中位数分别是 5、1 和 9。中位数显然是 5，所以这是我们要为快速排序的分区函数考虑的枢轴值。你接下来可以做的是将这个索引与最后一个交换并得到

[9,3,1,7,5]

现在我们尝试将所有小于 5 的值放在中间的左侧，将所有大于 5 的值放在中间的右侧。我们现在得到

[1,3,7,9,5]

用中间交换最后一个元素（我们存储分区值的地方）

[1,3,5,9,7]

这就是使用 3 中间的想法。想象一下，如果我们的分区是 1 或 9。你可以想象我们得到的这个数组对于快速排序来说不是一个好例子。

score 1 · Accepted Answer

的文档medianOfThree说：

* 2) Then bubble-sorts the values at the left, middle, and right indices.
*
* After this method is called, data[left] <= data[middle] <= data[right].
* The middle index will be returned.

所以你的描述与文档不符。您正在做的是对数据中的第一个、中间和最后一个元素进行就地排序，并始终返回中间索引。

因此，可以保证枢轴索引位于边界之间（除非中间最终位于边界中......）。

即便如此，旋转边界也没有错……

c++ - 带有“三的中位数”枢轴选择的快速排序：了解过程

3 回答 3

Related

Reference