c - How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

Question

I have a question and I tried to think over it again and again... but got nothing so posting the question here. Maybe I could get some view-point of others, to try and make it work...

The question is: we are given a SORTED array, which consists of a collection of values occurring an EVEN number of times, except one, which occurs ODD number of times. We need to find the solution in log n time.

It is easy to find the solution in O(n) time, but it looks pretty tricky to perform in log n time.

score 30 · Accepted Answer

定理：这个问题的每个确定性算法在最坏的情况下探测Ω(log ² n) 个内存位置。

证明（以更正式的方式完全重写）：

令 k > 0 为奇数且令 n = k ²。我们描述了一个强制 (log ₂ (k + 1)) ² = Ω(log ² n) 探测的对手。

我们称相同元素的最大子序列为组。对手的可能输入由 k 个长度为 k 的段x ₁ x ₂ … x _k组成。对于每个段 x _j，存在一个整数 b _j ∈ [0, k]，使得 x _j由j - 1的 b _{j个副本和 j 的 k - b}_j个副本组成。每组最多重叠两个段，每个段最多重叠两个组。

Group boundaries
|   |     |   |   |
 0 0 1 1 1 2 2 3 3
|     |     |     |
Segment boundaries

只要增加了两个，我们就按照惯例假设一个双边界。

Group boundaries
|     ||       |   |
 0 0 0  2 2 2 2 3 3

声明：第 j^个组边界 (1 ≤ j ≤ k) 的位置由段 x _j唯一确定。

证明：就在第 ((j - 1) k + b _j )^个内存位置之后，并且 x _j唯一确定 b _j。//

我们说该算法已经观察到第 j^个组边界，以防它对 x j 的探测结果_唯一地确定 x _j。按照惯例，总是观察输入的开始和结束。该算法可以在不观察组边界的情况下唯一确定组边界的位置。

Group boundaries
|   X   |   |     |
 0 0 ? 1 2 2 3 3 3
|     |     |     |
Segment boundaries

仅给定 0 0 ?，算法无法确定是否 ? 是 0 还是 1。然而，在上下文中，？必须是 1，否则会有三个奇数组，并且可以推断 X 处的组边界。这些推论对于对手来说可能是有问题的，但事实证明，只有在所讨论的群体边界“不相关”之后才能做出这些推论。

声明：在算法执行期间的任何给定点，考虑它观察到的组边界集。恰好一对连续的对处于奇数距离，奇数组位于它们之间。

证明：每隔一个连续的对只限制偶数组。//

将由特殊连续对限定的奇数长度子序列定义为相关子序列。

声明：相关子序列内部没有唯一确定的组边界。如果存在至少一个这样的边界，则奇数组的身份不是唯一确定的。

证明：不失一般性，假设不在相关子序列中的每个内存位置都已被探测过，并且相关子序列中包含的每个段恰好有一个未被探测过的位置。假设第 j^个组边界（称为 B）位于相关子序列的内部。根据假设，对 x _j的探测可以确定 B 的位置，最多有两个连续的可能性。我们称距左侧观察边界奇数距离的一个为奇数左，另一个为奇数右. 对于这两种可能性，我们从左到右工作并固定每个剩余内部组边界的位置，以便其左侧的组是偶数。（我们可以这样做，因为它们每个都有两个连续的可能性。）如果 B 在奇数左边，那么它左边的组就是唯一的奇数组。如果 B 在右奇数，则相关子序列中的最后一个组是唯一奇数组。两者都是有效输入，因此该算法既没有唯一确定 B 的位置，也没有唯一确定奇数组。//

例子：

Observed group boundaries; relevant subsequence marked by […]
[             ]   |
 0 0 Y 1 1 Z 2 3 3
|     |     |     |
Segment boundaries

Possibility #1: Y=0, Z=2
Possibility #2: Y=1, Z=2
Possibility #3: Y=1, Z=1

由于这种说法，算法，无论它如何工作，都必须将相关子序列缩小到一组。根据定义，它因此必须遵守一些群体边界。对手现在的简单任务是尽可能多地保持开放。

在算法执行期间的任何给定时间点，攻击者在内部致力于为相关子序列之外的每个内存位置提供一种可能性。一开始，相关子序列是整个输入，因此没有初始承诺。每当算法探测到 x _j的未提交位置时，攻击者必须提交以下两个值之一：j - 1 或 j。如果它可以避免让第 j^个边界被观察，它会选择一个值，该值至少留下剩余可能性的一半（相对于观察）。否则，它选择将至少一半的组保持在相关间隔中，并为其他组提交值。

这样，对手强制算法至少观察 log ₂ (k + 1) 个组边界，并且在观察第 j^个组边界时，强制算法至少进行 log ₂ (k + 1) 个探测。

扩展：

该结果通过随机化输入直接扩展到随机算法，将“最多减半”（从算法的角度来看）替换为“期望最多减半”，并应用标准浓度不等式。

它还扩展到没有组可以大于 s 个副本的情况；在这种情况下，下限是Ω(log n log s)。

score 15 · Accepted Answer

A sorted array suggests a binary search. We have to redefine equality and comparison. Equality simple means an odd number of elements. We can do comparison by observing the index of the first or last element of the group. The first element will be an even index (0-based) before the odd group, and an odd index after the odd group. We can find the first and last elements of a group using binary search. The total cost is O((log N)²).

PROOF OF O((log N)²)

  T(2) = 1 //to make the summation nice
  T(N) = log(N) + T(N/2) //log(N) is finding the first/last elements

For some N=2^k,

T(2^k) = (log 2^k) + T(2^(k-1))
       = (log 2^k) + (log 2^(k-1)) + T(2^(k-2))
       = (log 2^k) + (log 2^(k-1)) + (log 2^(k-2)) + ... + (log 2^2) + 1
       = k + (k-1) + (k-2) + ... + 1
       = k(k+1)/2
       = (k² + k)/2
       = (log(N)² + log(N))/ 2
       = O(log(N)²)

score 11 · Accepted Answer

查看数组的中间元素。通过几个适当的二进制搜索，您可以找到数组中的第一个和最后一个出现。例如，如果中间元素是'a'，你需要找到i和j，如下所示：

[* * * * a a a a * * *]
         ^     ^ 
         |     |
         |     |
         i     j

是j - i偶数吗？你完成了！否则（这是这里的关键），要问的问题是偶数还是奇数？你明白这条知识意味着什么吗？然后剩下的就简单了。

score 6 · Accepted Answer

此答案支持“throwawayacct”发布的答案。他应得的赏金。我在这个问题上花了一些时间，我完全相信他的证明是正确的，你需要 Ω(log(n)^2) 查询来找到出现奇数次的数字。我深信不疑，因为我只是在略读了他的解决方案后就重新创建了完全相同的论点。

在解决方案中，对手创建一个输入，使算法变得困难，但对人类分析员来说也很简单。输入由 k 个页面组成，每个页面都有 k 个条目。条目总数为 n = k^2，重要的是 O(log(k)) = O(log(n)) 和 Ω(log(k)) = Ω(log(n))。为了进行输入，攻击者制作了一个长度为 k 的字符串，格式为 00...011...1，转换位于任意位置。然后将字符串中的每个符号扩展为 aa...abb...b 形式的长度为 k 的页面，其中在第 i 个页面上，a=i 和 b=i+1。每个页面上的转换也处于任意位置，除了奇偶校验与扩展页面的符号一致。

了解分析算法最坏情况的“对手方法”很重要。对手回答有关算法输入的查询，而不承诺未来的答案。答案必须是一致的，当对手已经被锁定到足以让算法得出结论时，游戏就结束了。

在此背景下，以下是一些观察结果：

1）如果您想通过在该页面中进行查询来了解页面中转换的奇偶性，则必须了解转换的确切位置，并且需要 Ω(log(k)) 查询。任何查询集合都将转换点限制为一个区间，并且任何长度大于 1 的区间都有两个奇偶校验。在该页面中最有效的转换搜索是二分搜索。

2）最微妙和最重要的一点：有两种方法可以确定特定页面内的过渡奇偶性。您可以在该页面中进行足够多的查询以找到转换，或者如果您在较早和较晚的页面中都找到相同的奇偶校验，则可以推断奇偶校验。无法逃避这种非此即彼的情况。任何一组查询都将每个页面中的转换点限制在某个时间间隔内。对奇偶校验的唯一限制来自长度为 1 的间隔。否则，转换点可以自由摆动以获得任何一致的奇偶校验。

3）在对手方法中，没有幸运的罢工。例如，假设您在某个页面中的第一个查询是朝向一端而不是在中间。由于对手没有承诺答案，他可以自由地将过渡放在多头。

4) 最终结果是你被迫直接探测 Ω(log(k)) 页面中的奇偶校验，每个子问题的工作也是 Ω(log(k))。

5) 随机选择并不比对抗性选择好多少。数学更复杂，因为现在您可以获得部分统计信息，而不是严格的“是”，您知道奇偶校验或“否”，您不知道。但这没什么区别。例如，您可以给每个页面长度 k^2，这样很可能每个页面中的第一个 log(k) 查询几乎不会告诉您该页面中的奇偶校验。对手可以在开始时做出随机选择，并且仍然有效。

score 5 · Accepted Answer

从数组的中间开始向后走，直到你得到一个与中心不同的值。检查该边界上方的数字是奇数还是偶数。如果它是奇数，那么出现奇数次的数字在左边，所以在开始和找到的边界之间重复搜索。如果它是偶数，那么出现奇数次的数字必须在数组中的后面，所以在右半部分重复搜索。

如前所述，这具有对数和线性分量。如果你想让整个事情保持对数，而不是只是向后遍历数组到不同的值，你想使用二进制搜索来代替。除非您期望多次重复相同的数字，否则二进制搜索可能不值得。

score 3 · Accepted Answer

我有一个在 log(N/C)*log(K) 中工作的算法，其中 K 是最大同值范围的长度，C 是正在搜索的范围的长度。

该算法与之前发布的大多数算法的主要区别在于它利用了所有相同值范围都很短的情况。它不是通过对整个数组进行二分搜索来找到边界，而是首先通过跳回 1、2、4、8、...（log(K) 次迭代）步骤快速找到粗略估计，然后对结果范围（再次为 log(K)）。

算法如下（用C#编写）：

// Finds the start of the range of equal numbers containing the index "index", 
// which is assumed to be inside the array
// 
// Complexity is O(log(K)) with K being the length of range
static int findRangeStart (int[] arr, int index)
{
    int candidate = index;
    int value = arr[index];
    int step = 1;

    // find the boundary for binary search:
    while(candidate>=0 && arr[candidate] == value)
    {
        candidate -= step;
        step *= 2;
    }

    // binary search:
    int a = Math.Max(0,candidate);
    int b = candidate+step/2;
    while(a+1!=b)
    {
        int c = (a+b)/2;
        if(arr[c] == value)
            b = c;
        else
            a = c;
    }
    return b;
}

// Finds the index after the only "odd" range of equal numbers in the array.
// The result should be in the range (start; end]
// The "end" is considered to always be the end of some equal number range.
static int search(int[] arr, int start, int end)
{
    if(arr[start] == arr[end-1])
        return end;

    int middle = (start+end)/2;

    int rangeStart = findRangeStart(arr,middle);

    if((rangeStart & 1) == 0)
        return search(arr, middle, end);
    return search(arr, start, rangeStart);
}

// Finds the index after the only "odd" range of equal numbers in the array
static int search(int[] arr)
{
    return search(arr, 0, arr.Length);
}

score 2 · Accepted Answer

取中间元素 e。使用二分搜索查找第一次和最后一次出现。O(log(n)) 如果是奇数则返回 e。否则，递归到具有奇数个元素的一侧 [....]eeee[....]

运行时间将为 log(n) + log(n/2) + log(n/4).... = O(log(n)^2)。

score 1 · Accepted Answer

啊哈。有一个答案。

进行二分搜索，并在搜索每个值时向后移动，直到找到具有相同值的第一个条目。如果它的索引是偶数，它在奇数之前，所以向右移动。
如果它的数组索引是奇数，它在奇数球之后，所以向左移动。

在伪代码中（这是一般想法，未经测试......）：

    private static int FindOddBall(int[] ary)
    {
        int l = 0,
            r = ary.Length - 1;
        int n = (l+r)/2;
        while (r > l+2)
        {
            n = (l + r) / 2;
            while (ary[n] == ary[n-1])
                n = FindBreakIndex(ary, l, n);
            if (n % 2 == 0) // even index we are on or to the left of the oddball 
                l = n;
            else            // odd index we are to the right of the oddball
                r = n-1;
        }
        return ary[l];
    }
    private static int FindBreakIndex(int[] ary, int l, int n)
    {
        var t = ary[n];
        var r = n;
        while(ary[n] != t || ary[n] == ary[n-1])
            if(ary[n] == t)
            {
                r = n;
                n = (l + r)/2;
            }
            else
            {
                l = n;
                n = (l + r)/2;
            }
        return n;
    }

score 1 · Accepted Answer

您可以使用此算法：

int GetSpecialOne(int[] array, int length)
{
   int specialOne = array[0];

   for(int i=1; i < length; i++)
   {
      specialOne ^= array[i];
   }
   return specialOne;
}

在http://www.technicalinterviewquestions.net上可以找到的类似问题的帮助下解决了

score 0 · Accepted Answer

The clue is you're looking for log(n). That's less than n.

Stepping through the entire array, one at a time? That's n. That's not going to work.

We know the first two indexes in the array (0 and 1) should be the same number. Same with 50 and 51, if the odd number in the array is after them.

So find the middle element in the array, compare it to the element right after it. If the change in numbers happens on the wrong index, we know the odd number in the array is before it; otherwise, it's after. With one set of comparisons, we figure out which half of the array the target is in.

Keep going from there.

score 0 · Accepted Answer

使用哈希表

For each element E in the input set
    if E is set in the hash table
         increment it's value
    else        
         set E in the hash table and initialize it to 0

For each key K in hash table
    if K % 2 = 1
        return K

由于这个算法是 2n 它属于 O(n)

score 0 · Accepted Answer

我们没有关于数组内部长度分布以及整个数组的任何信息，对吧？

因此，arraylength 可能是 1、11、101、1001 或其他东西，至少 1 没有上限，并且必须包含至少 1 种类型的元素（“数字”），最多为 (length-1)/2 + 1 个元素，对于 1、11、101 的总大小：1、1 到 6、1 到 51 个元素等等。

我们是否应该假设所有可能大小的等概率？这将导致大小为 / 4 的子数组的中间长度，不是吗？

一个大小为 5 的数组可以分为 1、2 或 3 个子列表。

如果我们深入细节，似乎显而易见的事情并不那么明显。

一个大小为 5 的数组可以以一种方式“划分”为一个子列表，称其为“划分”是有争议的权利。它只是一个包含 5 个元素的列表 (aaaaa)。为避免混淆，我们假设列表中的元素是有序字符，而不是数字（a、b、c、...）。

分为两个子列表，它们可能是 (1, 4), (2, 3), (3, 2), (4, 1)。(abbbb, aabbb, aaabb, aaaab)。

现在让我们回顾一下之前提出的主张：是否应该假设“划分”（5）与将 4 个划分为 2 个子列表的概率相同？或者我们应该将它们混合在一起，并假设每个分区都是均匀可能的，（1/5）？

或者我们可以在不知道子列表长度概率的情况下计算解决方案吗？

score 0 · Accepted Answer

尝试这个：

int getOddOccurrence(int ar[], int ar_size)
{
     int i;
     int xor = 0; 
     for (i=0; i < ar_size; i++)     
        xor = xor ^ ar[i];

     return res;
}

每次异或时，XOR 都会抵消相同的数字，因此 1^1=0 但 1^1^1=1 所以每一对都应该抵消，而将奇数排除在外。

score -2 · Accepted Answer

您可以创建一个累积数组并计算每个数字出现的次数，然后在累积数组中找到奇数的元素。例子：

int a[]=new int[]{2,3,4,2,3,1,4,5,6,5,6,7,1};
int b[]=new int[1000];
for (int i=0;i<b.length;i++) {
    b[i]=0;
}
for (Int i=0;i<a.length;i++) {
    b[a[i]]++;
}
for (int i=0;i<b.length;i++) {
    if ( b[i]!=0) {
        if (b[i] %2==0) {
          system.out.println(i);  break;

    }
}

score -3 · Accepted Answer

假设索引从 0 开始。二分查找最小的偶数 i，使得 x[i] != x[i+1]; 你的答案是 x[i]。

编辑：由于公众需求，这里是代码

int f(int *x, int min, int max) {
  int size = max;
  min /= 2;
  max /= 2;
  while (min < max) {
    int i = (min + max)/2;
    if (i==0 || x[2*i-1] == x[2*i])
      min = i+1;
    else
      max = i-1;
  }
  if (2*max == size || x[2*max] != x[2*max+1])
    return x[2*max];
  return x[2*min];
}

c - How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

15 回答 15

Related

Reference