algorithm - 查找排序数组中的所有重复值和缺失值

Question

假设您在数组中有一个排序范围（x 到 y）的值。

x = 3;
y = 11;

array == 3, 4, 5, 6, 7, 8, 9, 10, 11

但有可能有些值是重复的，有些是缺失的，所以你可能有：

array == 4, 5, 5, 5, 7, 8, 9, 10, 10

用您的语言查找所有重复值和缺失值的最佳方法是什么，以便您获得：

resultMissingValuesArray == 3, 6, 11
resultDuplicatesArray == 5, 5, 10

这里有一些 C++ 代码可以帮助您入门：

#include <vector>
#include <iostream>
#include <algorithm>

using namespace std;

const int kLastNumber = 50000; // last number expected in array
const int kFirstNumber = 3; // first number expected in array

int main()
{
    vector<int> myVector;

    // fill up vector, skip values at the beginning and end to check edge cases
    for(int x = kFirstNumber + 5; x < kLastNumber - 5; x++)
    {   
        if(x % 12 != 0 &&  x % 13 != 0 && x % 17 != 0)
            myVector.push_back(x);  // skip some values

        else if(x % 9 == 0)
        {
            myVector.push_back(x);  // add duplicates
            myVector.push_back(x);  
        }

        else if(x % 16 == 0)
        {
            myVector.push_back(x);  // add multiple duplicates
            myVector.push_back(x);  
            myVector.push_back(x);  
            myVector.push_back(x);  
        }
    }

    // put the results in here
    vector<int> missingValues;
    vector<int> duplicates;

    //  YOUR CODE GOES HERE         

    // validate missingValues for false positives
    for(int x = 0; x < (int) missingValues.size(); ++x)
    {
        if(binary_search(myVector.begin(), myVector.end(), missingValues.at(x)))
            cout << "Oh noes! You missed an unmissed value. Something went horribly, horribly wrong.";
    }

    // validate duplicates (I think... errr)
    vector<int>::iterator vecItr = myVector.begin();
    vector<int>::iterator dupItr = duplicates.begin();

    while(dupItr < duplicates.end())
    {
        vecItr = adjacent_find(vecItr, myVector.end());     

        if(*vecItr != *dupItr)
            cout << "Oh noes! Something went horribly, horribly wrong.";

        // oh god
        while(++dupItr != duplicates.end() && *(--dupItr) == *(++dupItr) && *vecItr == *(++vecItr));            

        ++vecItr;
    }

    return 0;
}

我没有对验证部分进行太多测试，所以它们可能有问题（尤其是重复的部分）。

我将发布我自己的解决方案作为答案。

score 2 · Accepted Answer

我最喜欢的 - Python，非常简单：

x = 3
y = 11
array = [ 3, 4, 5, 6, 7, 8, 9, 10, 11 ]
test  = [ 4, 5, 5, 5, 7, 8, 9, 10, 10 ]

resultMissingValuesArray = set(range(x,y+1)).difference(test)        
resultDuplicatesArray = reduce(lambda i,j: i+j, [[n]*(test.count(n)-1) for n in set(test) if test.count(n)>1], [])

通过这一行可以更容易地找到重复项：

resultDuplicatesArray = [n for n in set(test) if test.count(n)>1]
# [5, 10] - just numbers, that have duplicates
# you can use test.count(5) for number of duplicates

score 2 · Accepted Answer

红宝石：

x = 3
y = 11
array  = [ 4, 5, 5, 5, 7, 8, 9, 10, 10 ]

resultMissingValuesArray = (x..y).to_a - array
resultDuplicatesArray = array.delete_if { |e| array.index(e) == array.rindex(e) }.uniq

score 1 · Accepted Answer

由于您已将其标记为与语言无关，因此这是我将使用的算法。

# Get numbers and sort them in ascending order.

input x,y;
input number[1..n];
sort number[1..n];

# Set dups and missing to empty sets.

dups = [];
missing = [];

# Get edge cases.

if number[1] > x:
    foreach i x .. number[1] - 1:
        missing.add(i)
if number[n] < y:
    foreach i number[n] + 1 .. y:
        missing.add(i)

# Process all numbers starting at second one.

foreach i 2 .. n:
    # If number same as last and not already in dups set, add it.

    if number[i] == number[i-1] and not dups.contains(number[i]):
        if number[i] >= x and number[i] <= y:
            dups.add(number[i])

    # If number not last number plus one, add all between the two
    #   to missing set.

    if number[i] != number[i-1] + 1:
        foreach j number[i-1] + 1 .. number[i] - 1:
            if j >= x and j <= y:
                missing.add(j)

score 1 · Accepted Answer

我认为您可以在 C++ 中快速做到这一点，方法是设置第二个数组作为检查以查看已找到哪些元素，然后在每次找到元素时将其元素递增一。所以：

int array = [3,4,5,6,7,8,9,10,11];
unsigned array_size = 9;
int test = [4,5,5,5,7,8,9,10,10];

// Find the maximum element in array
// This might not be necessary if it's given somewhere
unsigned max = 0;
unsigned min = -1;
for(unsigned i = 0; i < array_size; i++){
    if(array[i] > max)    max = array[i];
    if(array[i] < min)    min = array[i];
}

// Go make a counts vector to store how many examples of each value there are
vector< unsigned > counts(max+1, 0);
for(unsigned i = 0; i < array_size; i++)
    counts[test[i]]++;

// Gather the unique elements, duplicates and missing elements
vector< unsigned > unique;
vector< unsigned > duplicates;
vector< unsigned > missing;
for(unsigned i = min; i < max + 1; i++){
    switch(counts[i]){
        case 0 : missing.push_back(i);    break;
        case 1 : unique.push_back(i);     break;
        default: duplicates.push_back(i);
    }
}

这仅在数组中的数字大于 0 时才有效，这通常是这种情况。好处是它在元素数量上线性缩放，这很有用:-)

score 0 · Accepted Answer

if(myVector.front() > kFirstNumber)
    for(int x = kFirstNumber; x < myVector.at(0); ++x)
        if(x >= kFirstNumber && x <= kLastNumber)
            missingValues.push_back(x);

for(int x = 1; x < (int) myVector.size(); ++x)
{
    if(myVector.at(x) == myVector.at(x - 1))
        if(x >= kFirstNumber && x <= kLastNumber)
            duplicates.push_back(myVector.at(x));

    if(myVector.at(x) != myVector.at(x - 1) + 1)
        for(int y = myVector.at(x - 1) + 1; y <= myVector[x] - 1; y++)
            if(y >= kFirstNumber && y <= kLastNumber)
                missingValues.push_back(y);
}   

if(myVector.back() < kLastNumber)
    for(int x = myVector.back() + 1; x <= kLastNumber; ++x)
        if(x >= kFirstNumber && x <= kLastNumber)
            missingValues.push_back(x);

（我的解决方案非常难看，所以我用 Pax 算法的 C++ 实现替换了它。）

score 0 · Accepted Answer

在蟒蛇

consecutive=zip(l[0:-1],l[1:])
duplicate=[ a for (a,b) in consecutive if a==b]
missing=reduce(lambda u,v:u+v, [range(a+1,b) for (a,b) in consecutive])

algorithm - 查找排序数组中的所有重复值和缺失值

6 回答 6

Related

Reference