java - 当数组可能包含也可能不包含枢轴元素时就地分区

Question

是否存在不依赖于数组中存在的枢轴元素的就地分区算法（在快速排序实现中使用的那种）？

换句话说，数组元素必须按以下顺序排列：

小于枢轴的元素（如果有）
等于枢轴的元素（如果有）
大于枢轴的元素（如果有）

如果它恰好出现在数组中，它仍然必须返回枢轴元素的索引（排序后），否则返回一个特殊值；这可能是索引的补码，可以在其中插入元素以保持顺序（如 Java 标准二进制搜索函数的返回值。）

我所看到的实现要求将枢轴元素的索引作为参数给出（或始终位于数组的开头）。不幸的是，我事先不知道枢轴是否存在于数组中（或者它在哪里在数组中。）

编辑（回复meriton的评论）：我们也可以假设以下条件之一为真：

数组长度 < 2，或
至少一个元素是<= pivot ，至少一个元素是>= pivot。

数组中可能存在重复值（包括枢轴值的重复值。）

score 1 · Accepted Answer

这是一个有趣的问题。您可以通过数组的单个顺序传递来做到这一点。下面是 C# 中的代码示例。它假定一个名为的整数数组a和一个pivot值。

// Skip initial items that are < pivot
int iInsert = 0;
while (iInsert < a.Length && a[iInsert] < pivot)
{
    ++iInsert;
}
// Skip items that are = pivot
int numPivot = 0;
while (iInsert < a.Length && a[iInsert] == pivot)
{
    ++iInsert;
    ++numPivot;
}

int iCurrent = iInsert;
// Items will be added AFTER iInsert.
// Note that iInsert can be -1.
--iInsert;
while (iCurrent < a.Length)
{
    if (a[iCurrent] < pivot)
    {
        if (numPivot == 0)
        {
            ++iInsert;
            int temp = a[iInsert];
            a[iInsert] = a[iCurrent];
            a[iCurrent] = temp;
        }
        else
        {
            ++iInsert;
            int temp = a[iInsert];
            a[iInsert - numPivot] = a[iCurrent];
            a[iCurrent] = temp;
            a[iInsert] = pivot;
        }
    }
    else if (a[iCurrent] == pivot)
    {
        ++iInsert;
        int temp = a[iInsert];
        a[iInsert] = pivot;
        a[iCurrent] = temp;
        ++numPivot;
    }
    ++iCurrent;
}

int firstPivot = iInsert - numPivot + 1;

可能有一些优化机会。

这种方法的有趣之处在于，您可以轻松地将其调整为从传入数据流构建。您不必知道有多少商品即将到来。只需使用可以动态调整大小的列表。当最后一个项目进入时，您的列表是正确的。

score 0 · Accepted Answer

你很幸运：上个月编码 kata 是为了实现快速排序。这是我想出的：

/**
 * Sorts the elements with indices i such that l <= i < r
 */
private static void qsort(int[] a, int left, int right) {
    int l = left;
    int r = right - 1;

    if (l >= r) {
        return;
    }

    int pivot = a[l];
    l++;
    for (;;) {
        while (l <= r && a[l] <= pivot) l++;
        while (a[r] > pivot  && l < r) r--;

        if (l < r) {
            int t = a[l];
            a[l] = a[r];
            a[r] = t;
        } else {
            break;
        }
    }
    l--;
    a[left] = a[l];
    a[l] = pivot;

    qsort(a, left, l);
    qsort(a, r, right);
}

如您所见，该算法仅使用枢轴的原始位置来查找枢轴的值，并将枢轴交换为分区之间的索引。

如果我们不知道枢轴存在，我们将简单地处理等于枢轴的值，就像值 < 枢轴一样，也就是说，我们不会将小于、等于和大于枢轴的三个组中的元素划分为小于或等于枢轴和大于枢轴的两组，并在每个分区上递归。这个解决方案是正确的。

但是，将不再保证终止：已知 QuickSort 会终止，因为每个递归步骤使用的数组切片都比其调用者更短，并且数组切片已知更短，因为它们不包含枢轴元素。对于您修改后的算法，这不再适用。事实上，很容易看出终止将取决于您的枢轴值选择策略。

score 0 · Accepted Answer

另一种可能性是将方法分成两部分，一个划分为 [<= pivot, > pivot]，另一个将结果的第一部分划分为 [< pivot, >= pivot]。

public static int partitionLE(double[] a, int left, int right, double pivot) {
    double x, y;
    if (left >= right) return left;
    for (;;) {
        while ((x = a[left]) <= pivot) {
            if (++ left >= right) return left;
        }
        while ((y = a[right-1]) > pivot) {
            if (left >= -- right) return left;
        }
        if (left < right) {
            a[left] = y;
            a[right-1] = x;
        } else {
            return left;
        }
    }
}
public static int partitionLT(double[] a, int left, int right, double pivot) {
    double x, y;
    if (left >= right) return left;
    for (;;) {
        while ((x = a[left]) < pivot) {
            if (++ left >= right) return left;
        }
        while ((y = a[right-1]) >= pivot) {
            if (left >= -- right) return left;
        }
        if (left < right) {
            a[left] = y;
            a[right-1] = x;
        } else {
            return left;
        }
    }
}
public static int partition(double[] a, int left, int right, double pivot) {
    int lastP = partitionLE(a, left, right, pivot);
    int firstP = partitionLT(a, left, lastP, pivot);
    if (firstP < lastP) {
        return firstP;
    } else {
        return ~firstP;
    }
}

java - 当数组可能包含也可能不包含枢轴元素时就地分区

3 回答 3

Related

Reference