0

I need to randomly (with equal probability) pick some fixed number of elements from array which is in the file. I want to read file once and just keep picked elements because an array can be very long and I don't want to keep it in memory. There should be equal probability that each subarray is chosen. And also at the beginning I don't know the size of array.

How can I do it?

4

2 回答 2

1

You need something called Reservoir Sampling.

It's explained pretty well in this blog:

http://gregable.com/2007/10/reservoir-sampling.html

于 2014-12-19T09:53:13.863 回答
0

If you don't care about the exact number of elements you are picking up, an easy solution would be to read the file and pick each element with a fixed probability.

If you want an exact number, you would need to know before reading the whole file how many elements there are in this file, compute a list of elements you want (as a list of integers), then read the file and pick the right elements.

于 2014-12-19T09:36:18.870 回答