c - C 中每 N 个元素中出现频率最高的元素

Question

我有一个大小为 [0, 8388608] 的“相对较小”整数 A[i] = [0, 131072] 的大数组 A，我想找到每个 N=32 元素中出现频率最高的元素。

什么会更快，

A.创建一个大小为131072的关联数组B，遍历32个元素，递增B[A[i]]，然后遍历B，找到最大值，将B中所有元素重置为0，重复|A|/32次.

B. qsort 每 32 个元素，找到 A[i] == A[i-1] 的最大范围（因此也是最频繁的元素），重复 |A|/32 次。

（编辑） C. 别的东西。

score 3 · Accepted Answer

对第一种方法的改进是可能的。不需要遍历B。它可以是一个大小为131072的数组

每次递增时B[A[i]]，请查看该单元格中的新值。然后，拥有一个全局highest_frequency_found_far. 这从零开始，但在每次递增之后，都应该将新值与这个全局值进行比较。如果它更高，则替换全局。

你也可以有一个全局value_that_was_associated_with_the_highest_count

for each block of 32 members of A ... {
    size_t B [131072] = {0,0,...};
    size_t highest_frequency_found_so_far = 0;
    int value_associated_with_that = 0;
    for(a : A) { // where A just means the current 32-element sub-block
        const int new_frequency = ++B[a];
        if (new_frequency > highest_frequency_found_so_far) {
            highest_frequency_found_so_far = new_frequency;
            value_associated_with_that = a;
        }
    }
    // now, 'value_associated_with_that' is the most frequent element

    // Thanks to @AkiSuihkonen for pointing out a really simple way to reset B each time.
    // B is big, instead of zeroing each element explicitly, just do this loop to undo
    // the ++B[a] from earlier:
    for(a : A) { --B[a]; }
}

score 2 · Accepted Answer

2

btree 呢？您最多只需要 32 个节点，并且可以预先声明它们。

于 2013-11-12T16:53:26.360 回答

c - C 中每 N 个元素中出现频率最高的元素

2 回答 2

Related

Reference