java - Median of Medians algorithm not working consistently

Question

I have implemented the select/median of medians algorithm using the following as a reference http://www.ics.uci.edu/~eppstein/161/960130.html (this has previously been linked here Median of Medians in Java).

My code seems to work for small arrays (~100) and even works for arrays of size 100001 http://pastebin.com/mwRc4Hig (answer 5008), but then fails on an input array of size 10001 http://pastebin.com/YwVBmgDk (answer 4960, my code outputs 4958).

Note that the correct answers for the texts above are equivalent to sorting the array and returning the element at array[array.length / 2], regardless of whether the array size is even or odd.

I'm not sure how to debug this issue. The functionality seems arbitrary and I'm just lost. Here below is my code:

public class MedianOfMedians {

public static void main(String[] args) {
    MedianOfMedians mds = new MedianOfMedians();
    mds.run();
}

private void run() {
    Scanner in = new Scanner(System.in);
    int n = in.nextInt();
    int[] numArray = new int[n];
    for (int i = 0; i < n; i++) {
        numArray[i] = in.nextInt();
    }
    int median = select(numArray, numArray.length / 2);
    System.out.print(median);
}

private int select(int[] numArray, int k) {
    if (numArray.length <= 10) {
        int[] sorted = insertionSort(numArray);
        return sorted[k];
    }

    int divCount = (numArray.length % 5 == 0) ? numArray.length / 5 - 1 : numArray.length / 5;
    int[] medOfMed = new int[divCount + 1];
    int counter = 0;
    int[] subArray;

    while (counter <= divCount) {
        subArray = splitByFive(counter, divCount, numArray);
        medOfMed[counter] = select(subArray, subArray.length / 2);
        counter++;
    }

    int M = select(medOfMed, numArray.length / 10);

    List<Integer> lt = new ArrayList<>();
    List<Integer> eq = new ArrayList<>();
    List<Integer> gt = new ArrayList<>();
    for (int i : numArray) {
        if (i < M) {
            lt.add(i);
        } else if (i == M) {
            eq.add(i);
        } else {
            gt.add(i);
        }
    }
    if (k < lt.size()) {
        return select(createArray(lt), k);
    } else if (k > lt.size() + eq.size()) {
        return select(createArray(gt), k - lt.size() - eq.size());
    } else {
        return M;
    }
}

private int[] splitByFive(int splitIter, int divisions, int[] toSplit) {
    int numToCopy;
    if (splitIter == divisions) {
        numToCopy = toSplit.length - (5 * splitIter);
    } else {
        numToCopy = 5;
    }
    int[] subArray = new int[numToCopy];
    System.arraycopy(toSplit, splitIter * 5, subArray, 0, numToCopy);
    return subArray;
}

private int[] createArray(List<Integer> list) {
    int[] result = new int[list.size()];
    for (int i = 0; i < list.size(); i++) {
        result[i] = list.get(i);
    }
    return result;
}

private int[] insertionSort(int[] numArray) {
    for (int i = 1; i < numArray.length; i++) {
        int j = i;
        while (j - 1 >= 0 && numArray[j] < numArray[j - 1]) {
            int temp = numArray[j];
            numArray[j] = numArray[j - 1];
            numArray[j - 1] = temp;
            j--;
        }
    }
    return numArray;
}
}

score 2 · Accepted Answer

我没有时间调试你的代码，但也许我可以提供一种调试技术让你自己尝试，这对像这样的递归算法很有用。

如果存在算法失败的输入（并且正如您所发现的那样），那么就有一个最小的这样的输入——这个输入越小，就越容易找出问题所在。因为该算法是递归的，所以您有一个很好的方法来隔离出问题的第一个地方：您可以测试您即将返回的结果是否select()正确（使用缓慢、受信任的方法，例如将数据复制到临时缓冲区，对其进行排序，然后在返回值之前抓取中途元素）。return如果您重新排列函数以仅使用单个语句，这样做会容易得多，例如：

private int select(int[] numArray, int k) {
    int knownCorrectAnswer = selectSlowlyButDefinitelyCorrectly(numArray, k);
    int willReturn;
    if (numArray.length <= 10) {
        int[] sorted = insertionSort(numArray);
        willReturn = sorted[k];    // Just remember what we will return
    } else {    // Need to add else branch here now

        ...

        if (k < lt.size()) {
            willReturn = select(createArray(lt), k);
        } else if (k > lt.size() + eq.size()) {
            willReturn = select(createArray(gt), k - lt.size() - eq.size());
        } else {
            willReturn = M;
        }
    }    // End of inserted else branch

    if (willReturn == knownCorrectAnswer) {
        return willReturn;
    } else {
        yell("First problem occurs with numArray=<...> and k=<...>!");
    }
}

yell()应该打印出整个问题实例并停止程序（例如通过抛出异常）。这个设置的好处是你知道当yell()被调用时，已经完成的每个调用select()都是正确的——因为如果不是，yell()就会已经被调用并且程序会在现在之前停止。所以产生的输出yell()保证是第一个（不一定是最小的，但通常也是）出现问题的子问题。

java - Median of Medians algorithm not working consistently

1 回答 1

Related

Reference