有多种方法可以做到这一点,第一种方法:
1. use input parameter k to dynamically allocate an array of numbers
unsigned * numsArray = (unsigned *)malloc(sizeof(unsigned) * k);
2. run a loop that gets k random numbers and stores them into the numsArray (must be careful here to check each new random to see if we have gotten it before, and if we have, get another random, etc...)
3. sort numsArray
4. run a loop beginning at the beginning of DataSet with your own incrementing counter dataCount and another counter numsCount both beginning at 0. whenever dataCount is equal to numsArray[numsCount], grab the current data object and add it to your newly created random list then increment numsCount.
5. The loop in step 4 can end when either numsCount > k or when dataCount reaches the end of the dataset.
6. The only other step that may need to be added here is before any of this to use the next command of the object type to count how large the dataset is to be able to bound your random numbers and check to make sure k is less than or equal to that.
第二种方法是遍历实际列表 MULTIPLE 次。
// one must assume that once we get to the end, we can start over within the set again
1. run a while loop that checks for endOfData
a. count up a count variable that is initialized to 0
2. run a loop from 0 through k-1
a. generate a random number that you constrain to the list size
b. run a loop that moves through the dataset until it hits the rand element
c. compare that element with all other elements in your new list to make sure it isnt already in your new list
d. store the element into your new list
可能有一些方法可以通过存储当前列表位置来加速第二种方法,这样,如果您生成一个超过当前指针的随机数,您不必再次移动整个列表以返回元素 0,然后返回您希望检索的元素。
一种潜在的第三种方法可能是:
1. run a loop from 0 through k-1
a. generate a random
b. use the generated random as a skip count, move skip count objects through the list
c. store the current item from the list into your new list
第三种方法的问题是不知道列表有多大,你不知道如何限制随机跳过计数。此外,即使你这样做了,它也不会真正看起来像一个随机抓取的子集,可以轻松到达列表中的最后一个元素,因为从统计上讲,你永远不可能到达最后一个元素(即不是每个元素都给出被选中的平等机会。)
可以说,最快的方法是方法 1,首先生成随机数字,然后只遍历列表一次(是的,实际上是两次,一次是为了获取数据集列表的大小,然后是再次获取随机元素)