c - 源中的内容导致平台相关结果

Question

我用 C 语言编写了归并排序算法，并在本地和远程编译了它。我假设源代码中的某些内容导致了平台依赖性。

#include <stdio.h>
#include <limits.h>
#include <math.h>

void merge(int A[], int p, int q, int r) {
    int n1 = q - p + 1;
    int n2 = r - q;

    int L[n1];
    int R[n2];

    for (int i = 0; i < n1; ++i) {
        L[i] = A[p + i];
    }

    for(int j = 0; j < n2; ++j) {
        R[j] = A[q + j + 1];
    }

    L[n1] = INT_MAX;
    R[n2] = INT_MAX;

    int i = 0;
    int j = 0;

    for (int k = p; k <= r; ++k) {
        if (L[i] <= R[j]) {
            A[k] = L[i];
            i = i + 1;
        } else {
            A[k] = R[j];
            j = j + 1;
        }
    }
}

void merge_recurse(int A[], int p, int r) {
    if (p < r) {
        int q = floor((p + r) / 2);
        merge_recurse(A, p, q);
        merge_recurse(A, q + 1, r);
        merge(A, p, q, r);
    }
}

void merge_sort(int A[], size_t length) {
    merge_recurse(A, 0, (int)length - 1);
}

int main() {
    int length = 9;
    int A[] = { 3, 7, 61, 3, 40, 4, -1, 8, 10 };

    merge_sort(A, length);

    for (int i = 0; i < length; ++i) {
        printf("%i, ", A[i]);
    }

    return 0;
}

在线编译时返回正确的结果。

-1, 3, 3, 4, 7, 8, 10, 40, 61,

但是，当我在 Linux 上本地编译源代码时，会返回不正确的结果。

-1, 4, 8, 10, 2147483647, 3, 7, 40, 61

源代码中的什么导致了这些不同的结果？

score 2 · Accepted Answer

L[n1] = INT_MAX;写到数组的末尾。该声明int L[n1];创建了一个可以从0to索引的数组n1-1。没有L[n1]。同样的事情R[n2] = INT_MAX;

当代码写入数组末尾时，它可能会以改变代码行为的方式踩到另一个变量。或者它可能没有任何可观察到的效果。在线编译器恰好以没有发生任何不良情况的方式排列内存中的变量。这是完全不可预测的，被称为未定义的行为。

score 0 · Accepted Answer

The code has undefined behavior: both L[n1] = INT_MAX; and R[n2] = INT_MAX; write beyond the end of the respective arrays.

Undefined behavior may have no visible effects, which is what you observe on the online compiler or produce incorrect result as you see on your Linux system, or produce catastrophic results, such as a program crash or worse.

Note that your implementation uses a confusing convention and an unsafe method:

passing r as the index of the last element is much idiomatic in C than using the index to the following element. With this alternative convention, the length of the slice is simply r - p, the initial call merge_recurse(A, 0, length); and there are no confusing +1/-1 adjustments.
using sentinels at the end of the temporary arrays is unsafe because the value INT_MAX may actually be present in the array, causing an incorrect result or defined behavior as index i or j may increase beyond the end of their respective array. This method should not be taught in schools, it is fundamentally flawed. Just test the index boundaries.
instead of using int q = floor((p + r) / 2); you should use integer arithmetics:
```
int q = p + (r - p) / 2;
```

Here is a modified version:

#include <stdio.h>

void merge(int A[], int p, int q, int r) {
    int n1 = q - p;
    int n2 = r - q;

    int L[n1];
    int R[n2];

    for (int i = 0; i < n1; ++i) {
        L[i] = A[p + i];
    }

    for (int j = 0; j < n2; ++j) {
        R[j] = A[q + j];
    }

    int i = 0;
    int j = 0;

    for (int k = p; k < r; ++k) {
        if (i < n1 && (j >= n2 || L[i] <= R[j])) {
            A[k] = L[i];
            i++;
        } else {
            A[k] = R[j];
            j++;
        }
    }
}

void merge_recurse(int A[], int p, int r) {
    if (r - p > 1) {
        int q = p + (r - p) / 2;
        merge_recurse(A, p, q);
        merge_recurse(A, q, r);
        merge(A, p, q, r);
    }
}

void merge_sort(int A[], size_t length) {
    merge_recurse(A, 0, length);
}

int main() {
    int length = 9;
    int A[] = { 3, 7, 61, 3, 40, 4, -1, 8, 10 };

    merge_sort(A, length);

    for (int i = 0; i < length; ++i) {
        printf("%i, ", A[i]);
    }
    printf("\n");

    return 0;
}

c - 源中的内容导致平台相关结果

2 回答 2

Related

Reference