algorithm - 我们如何通过归纳证明二分查找是正确的？

Question

我很难理解如何使用归纳法和一些不变量来证明算法的正确性。即，如何找到不变量，何时使用归纳假设——尤其是对于二分搜索？我还没有找到一个直观的回应，所以我希望有人可以在这里对这个话题有所了解。

score 13 · Accepted Answer

假设二分查找是这样定义的：

def binary_search(a,b,e,x):
  n = e-b
  if n==0: return None
  m = b + int(n/2)
  if x<a[m]: return binary_search(a,b,m,x)
  if x>a[m]: return binary_search(a,m+1,e,x)
  return m

在哪里

a 是值数组——假设已排序
[b,e) 是从 b 到 e 的范围，包括 b 但不包括我们正在搜索的 e。
x 是我们正在搜索的值

该函数的目标是返回 i 其中 a[i]==x 如果存在这样的 i 值，否则返回 None。

binary_search 适用于大小为零的范围：

证明：如果范围不包含任何元素，则 n==0 并且函数返回 None，这是正确的。

假设 binary_search 适用于从 0 到 n 的任意大小的元素范围，则二分搜索适用于大小为 n+1 的元素范围。

证明：

因为数组是有序的，如果x < a[m]，那么对于所有k > m，x < a[k]。这意味着我们只需要搜索范围 [b,m)。范围 [b,m) 必然小于范围 [b,e)，并且我们假设二分查找适用于所有小于 n+1 的范围，因此它适用于 [b,m)。

如果 x > a[m]，则适用类似的逻辑。

如果 x==a[m]，那么函数将返回 m，这是正确的。

score 5 · Accepted Answer

假设排序后的数组是a[0...n]. 二分查找的工作原理是将这个数组递归地分成三部分，中间元素m，左边部分是所有元素<= m（因为数组是按假设排序的），右边部分是所有元素>=m（同样，因为数组按假设排序）。

如何制定不变量？

让我们首先考虑一下二分搜索是如何工作的。如果键（正在搜索的项目）是，k那么我将它与中间元素进行比较m。

如果k = m，我找到了我的物品（无事可做）
如果k < m 那么我确定如果k出现在中a，则它不能出现在数组的正确部分，因为数组该部分上的所有元素都是>= m > k. 所以它必须出现（如果出现）在数组的左侧部分。
如果k > m 那我肯定知道......

那么，在这样的递归计算的每一步，我们能保证什么呢？在每个步骤中，我们都可以识别两个索引i, j并声称“如果k它是其中的一个元素，a[0...n]那么它必须出现在两者之间i, j”。这是不变量，因为它适用于所有递归步骤，并且每一步都挤压此范围i, j，直到找到您的项目或此范围变为空（范围在时为非空i < j）。

感应是如何工作的？

对于您采取的基本情况i = 0, j = n。不变量是微不足道的。
对于归纳步骤，假设不变量适用于某个递归步骤pwhere i = i_p & j = j_p。然后你必须证明对于下一个递归步骤，i, j更新使得不变量仍然成立。这是您必须使用上述步骤 2) 和 3) 中的参数的地方。
范围i, j严格递减，因此计算必须终止。

我错过了什么吗？

score 3 · Accepted Answer

/** Return an index of x in a.
 *  Requires: a is sorted in ascending order, and x is found in the array a
 *  somewhere between indices left and right.
 */
int binsrch(int x, int[] a, int left, int right) {
  while (true) {
    int m = (left+right)/2;
    if (x == a[m]) return m;
    if (x < a[m])
    r = m−1;
    else
    l = m+1;
  }
}

关键的观察是 binsrch 以一种分而治之的方式工作，只在某种程度上“更小”的参数调用自己。

假设 P(n) 是 binsrch 对 right-left = n 的输入正确工作的断言。如果我们可以证明 P(n) 对所有 n 都是正确的，那么我们就知道 binsrch 对所有可能的参数都有效。

基本情况。在n=0的情况下，我们知道left=right=m。由于我们假设该函数只会在 x 在左右之间找到时才会被调用，所以 x = a[m] 的情况一定是这样，因此该函数将返回 m，即数组 a 中 x 的索引。

感应步骤。我们假设 binsrch 只要 left-right ≤ k 就可以工作。我们的目标是证明它适用于 left-right = k + 1 的输入。存在三种情况，其中 x = a[m]，其中 x < a[m] 和 x > a[m]。

    Case x = a[m]. Clearly the function works correctly.

    Case x < a[m]. We know because the array is sorted that x must be found between a[left] and a[m-1]. The n for the recursive call is n = m − 1 − left = ⌊(left+right)/2⌋ − 1 − left. (Note that ⌊x⌋ is the floor of x, which rounds it down toward negative infinity.) If left+right is odd, then n = (left + right − 1)/2 − 1 − left = (right − left)/2 − 1, which is definitely smaller than right−left. If left+right is even then n = (left + right)/2 − 1 − left = (right−left)/2, which is also smaller than k + 1 = right−left because right−left = k + 1 > 0. So the recursive call must be to a range of a that is between 0 and k cells, and must be correct by our induction hypothesis.

    Case x > a[m]. This is more or less symmetrical to the previous case. We need to show that r − (m + 1) ≤ right − left. We have r − (m + 1) − l = right − ⌊(left + right)/2⌋ − 1. If right+left is even, this is (right−left)/2 − 1, which is less than right−left. If right+left is odd, this is right− (left + right − 1)/2 − 1 = (right−left)/2 − 1/2, which is also less than right−left. Therefore, the recursive call is to a smaller range of the array and can be assumed to work correctly by the induction hypothesis.

因为在所有情况下，归纳步骤都有效，我们可以得出结论 binsrch（及其迭代变体）是正确的！

请注意，如果我们在编码 x > a[m] 的情况下出错，并且将 m 作为 left 而不是 m+1 传递（很容易做到！），那么我们刚刚构建的证明在这种情况下将失败。事实上，当右 = 左 + 1 时，算法可能会进入无限循环。这显示了仔细归纳推理的价值。

参考：http ://www.cs.cornell.edu/Courses/cs211/2006sp/Lectures/L06-Induction/binary_search.html

score 1 · Accepted Answer

你应该证明在每一步二分查找之后arr[first] <= element <= arr[last]

从这个和终止您可以得出结论，一旦二进制搜索终止arr[first] == element == arr[last]

algorithm - 我们如何通过归纳证明二分查找是正确的？

4 回答 4

Related

Reference