1

我想了解水库采样算法,我们从给定的 S 元素集合中选择 k 个元素,使得 k <= S。

在 wiki 上给出的算法中:

array R[k];    // result
integer i, j;

 // fill the reservoir array

for each i in 1 to k do
    R[i] := S[i]
done;

// replace elements with gradually decreasing probability

for each i in k+1 to length(S) do
    j := random(1, i);   // important: inclusive range
    if j <= k then
        R[j] := S[i]
    fi
done

如果我理解正确的话,我们首先从集合中选择 k 个元素,然后不断解析 S 的 i 个元素,生成 1 到 i 范围内的随机 no j 并将元素 j 替换为 S[i]。

如果要采样的集合 K 非常大,它看起来很好,但是如果我想从无限大小(至少未知大小)的链表中随机选择 1 个元素,我将如何使用这个算法来做...... ?

4

2 回答 2

3

The reservoir sampling algorithm works on any sized linked list, even one whose length is unknown in advance. In fact, one of the main selling points of reservoir sampling is that it works on data streams whose size is not known in advance.

If you set k = 1 and then run the normal reservoir sampling algorithm, then you should correctly get a uniformly random element from the list.

Hope this helps!

于 2013-01-20T21:25:50.133 回答
0

我已经实现了一种不同的算法来解决这个问题,这是我的代码

static char[] solution2(String stream, int K) {
    HashSet<Integer> set = new HashSet();
    char[] list = new char[K];
    stream = stream.concat(stream2);
    Random ran = new Random();
    for (int i = 0; i < K; i++) {
        int y = ran.nextInt(stream.length());
        if (set.add(y)) {
            list[i] = stream.charAt(y);
        } else {
            i--; //skip this iteration since its duplicate number
        }
    }
    return list;
}

无需遍历所有流值,只需选择一个随机值 J 并从流中获取 N[J]。

于 2016-09-23T17:08:50.430 回答