c - 在整数数组中查找最大/最小出现次数

Question

我刚刚完成了一个算法，该算法在输入整数数组中查找具有最大/最小出现次数的值。我的想法是对数组进行排序（所有出现现在都是按顺序排列的）并使用一<value:occurrences>对来为每个值存储对应的出现次数。

它应该很O(nlogn)复杂，但我认为有一些常数乘数。我可以做些什么来提高性能？

#include <stdio.h>
#include <stdlib.h>
#include "e7_8.h"

#define N 20
/*Structure for <value, frequencies_count> pair*/
typedef struct {
    int value;
    int freq;
} VAL_FREQ;


void  get_freq(int *v, int n, int *most_freq, int *less_freq) {

    int v_i, vf_i, current_value, current_freq;

    VAL_FREQ* sp = malloc(n*sizeof(VAL_FREQ));
    if(sp == NULL) exit(EXIT_FAILURE);

    mergesort(v,n);

    vf_i = 0;
    current_value = v[0];
    current_freq = 1;
    for(v_i=1; v_i<n+1; v_i++) {
        if(v[v_i] == current_value) current_freq++;
        else{
            sp[vf_i].value = current_value;
            sp[vf_i++].freq = current_freq;
            current_value = v[v_i];
            current_freq = 1;
        }
    }
    /*Finding max,min frequency*/
    int i, max_freq_val, max_freq, min_freq_val, min_freq;

    max_freq = sp[0].freq;
    max_freq_val = sp[0].value;
    min_freq = sp[0].freq;
    min_freq_val = sp[0].value;
    for(i=1; i<vf_i; i++) {
        if(sp[i].freq > max_freq) {
            max_freq = sp[i].freq;
            max_freq_val = sp[i].value;
        }
        if(sp[i].freq < min_freq) {
            min_freq = sp[i].freq;
            min_freq_val = sp[i].value;
        }
    }

    *most_freq = max_freq_val;
    *less_freq = min_freq_val;

    free(sp);
}

score 5 · Accepted Answer

Use a hash-table to implement a key-value map? That should give you O(n) expected time.^*

_{* However, note that it's O(n²) in the worst-case. This only occurs when all entries hash to the same bucket, and you effectively end up searching a linked-list for every iteration! For decent hash-table implementation, the probability of this occurring is very low indeed.}

score 4 · Accepted Answer

让我们从您的算法已经是 O(n*log(n)) 的事实开始，因为每一步都是 O(n)，而排序是 O(n*log(n))。是否可以显着改善取决于您期望的输入类型。编辑：除非，并且似乎是这种情况，否则在流程结束时对值进行排序（在任何情况下都按值，而不是按出现次数）不是要求的一部分，在这种情况下不要错过 Oli查尔斯沃思的回答。

有两个概念：第一个是你要获得多少个样本（n）；第二个是它们的值的“集中程度”，这些值可以分布的范围有多窄或多宽（w = MAX_VALUE - MIN_VALUE）。

如果 n 小于 w （所以你的值是稀疏的），那么你的方法已经是最优的并且几乎没有改进的空间。

但是如果 w 小而 n 大，则使用以下方法可以获得很多收益。

假设您知道您不能获得任何小于 MIN_VALUE 的值，也不能获得大于 MAX_VALUE 的值。然后，您可以将值用作收集频率的数组的索引。通过这种方式，您可以跳过排序步骤 (O(n*log(n)) )，然后计算 O(n) 中的频率。

int buffer_frequencies[MAX_VALUE - MIN_VALUE + 1];

//Now reset the array with some convenient function like memset

int* value_frequencies = buffer_frequencies;
value_frequencies -= MIN_VALUE; //Shift the beginning of the array, so that 
                                //you can use the value directly as the array index
//You are allowed to use negative indexes
for(v_i=0; v_i < n; v_i++) {
  value_frequencies[v[v_i]]++;
  }

甚至（可能是 for 循环的稍快版本，但通常一个好的编译器已经将它转换为最有效的版本）：

int* p_v = v;
int* end_p_v = v+n;
for(; p_v < end_p_v; p_v++) {
  value_frequencies[*p_v]++;
  }

注意这个方法（两个版本）对输入值非常敏感，即如果你得到一个超过 MIN_VALUE 或 MAX_VALUE 的值，你将打破内存边界

然后是算法的第二部分：

//First cycle could be optimized, but it has no impact
int i = MIN_VALUE;
max_freq = value_frequencies[i];
max_freq_val = i;
min_freq = value_frequencies[i];
min_freq_val = i;
for(; i<MAX_VALUE; i++) {
    max_freq_val = (value_frequencies[i] > max_freq) ? i : max_freq_val;
    max_freq = (value_frequencies[i] > max_freq) ? value_frequencies[i] : max_freq;
    min_freq_val = (value_frequencies[i] < min_freq) ? i : min_freq_val;
    min_freq = (value_frequencies[i] < min_freq) ? value_frequencies[i] : min_freq;
    }
}

c - 在整数数组中查找最大/最小出现次数

2 回答 2

Related

Reference