java - 从 List 中取 n 个随机元素?

Question

如何从中获取 n 个随机元素ArrayList<E>？理想情况下，我希望能够连续调用该take()方法以获取另一个 x 元素，而无需替换。

score 117 · Accepted Answer

两种主要方式。

使用Random#nextInt(int)：

List<Foo> list = createItSomehow();
Random random = new Random();
Foo foo = list.get(random.nextInt(list.size()));

但是，不能保证连续n调用返回唯一元素。

使用Collections#shuffle()：
```
List<Foo> list = createItSomehow();
Collections.shuffle(list);
Foo foo = list.get(0);
```
它使您能够通过递增的索引获取n唯一元素（假设列表本身包含唯一元素）。

如果您想知道是否有 Java 8 Stream 方法；不，没有内置的。Comparator#randomOrder()标准 API 中没有这样的东西（还没有？）。您可以尝试以下类似的方法，同时仍然满足严格的Comparator合同（尽管分布非常糟糕）：

List<Foo> list = createItSomehow();
int random = new Random().nextInt();
Foo foo = list.stream().sorted(Comparator.comparingInt(o -> System.identityHashCode(o) ^ random)).findFirst().get();

改用更好Collections#shuffle()。

score 39 · Accepted Answer

到目前为止，大多数提议的解决方案都建议通过检查唯一性并在需要时重试来进行完整的列表洗牌或连续随机挑选。

但是，我们可以利用 Durstenfeld 算法（当今最流行的 Fisher-Yates 变体）。

Durstenfeld 的解决方案是通过在每次迭代中将“被击中”的数字与最后一个未被击中的数字交换来将它们移动到列表的末尾。

由于上述原因，我们不需要打乱整个列表，而是运行循环的步数与返回所需的元素数一样多。如果我们使用完美随机函数，该算法确保列表末尾的最后 N 个元素是 100% 随机的。

在我们需要从数组/列表中选择预定（最大）数量的随机元素的许多实际场景中，这种优化的方法对于各种纸牌游戏非常有用，例如德州扑克，您先验地知道数字每场比赛使用的牌张；一副牌通常只需要有限数量的牌。

public static <E> List<E> pickNRandomElements(List<E> list, int n, Random r) {
    int length = list.size();

    if (length < n) return null;

    //We don't need to shuffle the whole list
    for (int i = length - 1; i >= length - n; --i)
    {
        Collections.swap(list, i , r.nextInt(i + 1));
    }
    return list.subList(length - n, length);
}

public static <E> List<E> pickNRandomElements(List<E> list, int n) {
    return pickNRandomElements(list, n, ThreadLocalRandom.current());
}

score 10 · Accepted Answer

如果您想从列表中连续选择 n 个元素，并且能够在不一遍又一遍地替换的情况下这样做，您可能最好随机排列元素，然后以 n 个块为单位取出块。如果您随机排列列表，则可以保证您选择的每个块的统计随机性。也许最简单的方法是使用Collections.shuffle.

score 7 · Accepted Answer

简单明了

   // define ArrayList to hold Integer objects
    ArrayList<Integer> arrayList = new ArrayList<>();

    for (int i = 0; i < maxRange; i++) {
        arrayList.add(i + 1);
    }

    // shuffle list
    Collections.shuffle(arrayList);

    // adding defined amount of numbers to target list
    ArrayList<Integer> targetList = new ArrayList<>();
    for (int j = 0; j < amount; j++) {
        targetList.add(arrayList.get(j)); 
    }

    return targetList;

score 6 · Accepted Answer

一个公平的方法是遍历列表，在第 n 次迭代中计算是否选择第 n 个元素的概率，这本质上是您仍然需要选择的项目数量与元素数量的比例在列表的其余部分中可用。例如：

public static <T> T[] pickSample(T[] population, int nSamplesNeeded, Random r) {
  T[] ret = (T[]) Array.newInstance(population.getClass().getComponentType(),
                                    nSamplesNeeded);
  int nPicked = 0, i = 0, nLeft = population.length;
  while (nSamplesNeeded > 0) {
    int rand = r.nextInt(nLeft);
    if (rand < nSamplesNeeded) {
      ret[nPicked++] = population[i];
      nSamplesNeeded--;
    }
    nLeft--;
    i++;
  }
  return ret;
}

（这段代码是从我不久前写的关于从列表中选择随机样本的页面中复制的。）

score 3 · Accepted Answer

如其他答案Collections.shuffle所述，由于复制，当源列表很大时效率不是很高。这是一个 Java 8 单行代码：

如果您不需要来自源的许多元素，则在像 ArrayList 这样的随机访问列表上足够高效
不修改源
不保证唯一性，如果它对您来说不是超级重要的话。如果你从 100 个中选择 5 个，那么这些元素很有可能是独一无二的。

代码：

private static <E> List<E> pickRandom(List<E> list, int n) {
  return new Random().ints(n, 0, list.size()).mapToObj(list::get).collect(Collectors.toList());
}

但是，对于没有快速随机访问的列表（如 LinkedList），复杂度将是n*O(list_size).

score 2 · Accepted Answer

使用以下类：

import java.util.Enumeration;
import java.util.Random;

public class RandomPermuteIterator implements Enumeration<Long> {
    int c = 1013904223, a = 1664525;
    long seed, N, m, next;
    boolean hasNext = true;

    public RandomPermuteIterator(long N) throws Exception {
        if (N <= 0 || N > Math.pow(2, 62)) throw new Exception("Unsupported size: " + N);
        this.N = N;
        m = (long) Math.pow(2, Math.ceil(Math.log(N) / Math.log(2)));
        next = seed = new Random().nextInt((int) Math.min(N, Integer.MAX_VALUE));
    }

    public static void main(String[] args) throws Exception {
        RandomPermuteIterator r = new RandomPermuteIterator(100);
        while (r.hasMoreElements()) System.out.print(r.nextElement() + " ");
    }

    @Override
    public boolean hasMoreElements() {
        return hasNext;
    }

    @Override
    public Long nextElement() {
        next = (a * next + c) % m;
        while (next >= N) next = (a * next + c) % m;
        if (next == seed) hasNext = false;
        return  next;
    }
}

score 2 · Accepted Answer

继续选择一个随机元素并确保不再选择相同的元素：

public static <E> List<E> selectRandomElements(List<E> list, int amount)
{
    // Avoid a deadlock
    if (amount >= list.size())
    {
        return list;
    }

    List<E> selected = new ArrayList<>();
    Random random = new Random();
    int listSize = list.size();

    // Get a random item until we got the requested amount
    while (selected.size() < amount)
    {
        int randomIndex = random.nextInt(listSize);
        E element = list.get(randomIndex);

        if (!selected.contains(element))
        {
            selected.add(element);
        }
    }

    return selected;
}

理论上这可以无休止地运行，但实际上它很好。你越接近整个原始列表，它的运行时间就越慢，但这不是选择随机子列表的重点，是吗？

score 0 · Accepted Answer

下面的类从任何类型的列表中检索 N 项。如果您提供种子，那么在每次运行时它将返回相同的列表，否则，新列表的项目将在每次运行时更改。您可以在运行主要方法时检查其行为。

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Random;

public class NRandomItem<T> {
    private final List<T> initialList;

    public NRandomItem(List<T> list) {
        this.initialList = list;
    }

    /**
     * Do not provide seed, if you want different items on each run.
     * 
     * @param numberOfItem
     * @return
     */
    public List<T> retrieve(int numberOfItem) {
        int seed = new Random().nextInt();
        return retrieve(seed, numberOfItem);
    }

    /**
     * The same seed will always return the same random list.
     * 
     * @param seed,
     *            the seed of random item generator.
     * @param numberOfItem,
     *            the number of items to be retrieved from the list
     * @return the list of random items
     */
    public List<T> retrieve(int seed, int numberOfItem) {
        Random rand = new Random(seed);

        Collections.shuffle(initialList, rand);
        // Create new list with the number of item size
        List<T> newList = new ArrayList<>();
        for (int i = 0; i < numberOfItem; i++) {
            newList.add(initialList.get(i));
        }
        return newList;
    }

    public static void main(String[] args) {
        List<String> l1 = Arrays.asList("Foo", "Bar", "Baz", "Qux");
        int seedValue = 10;
        NRandomItem<String> r1 = new NRandomItem<>(l1);

        System.out.println(String.format("%s", r1.retrieve(seedValue, 2)));
    }
}

score 0 · Accepted Answer

此解决方案不会修改原始列表或以其他方式随着列表大小而增加复杂性。

要从 7 个列表中获取 4 个样本，我们只需从所有 7 个元素中选择一个随机元素，然后从剩余的 6 个中选择一个随机元素，依此类推。如果我们已经选择了索引 4、0、3，接下来我们从 0、1、2、3 中生成一个随机数，分别代表索引 1、2、5、6。

static Random rand = new Random();

static <T> List<T> randomSample(List<T> list, int size) {
    List<T> sample = new ArrayList<>();

    for (int sortedSampleIndices[] = new int[size], i = 0; i < size; i++) {
        int index = rand.nextInt(list.size() - i);

        int j = 0;
        for (; j < i && index >= sortedSampleIndices[j]; j++)
            index++;
        sample.add(list.get(index));

        for (; j <= i; j++) {
            int temp = sortedSampleIndices[j];
            sortedSampleIndices[j] = index;
            index = temp;
        }
    }

    return sample;
}

score 0 · Accepted Answer

所有这些答案都需要一个可修改的列表或遇到性能问题

这是一个需要 O(k) 额外空间并且保证在 O(k) 时间内运行并且不需要可修改数组的快速代码段。（在地图中执行洗牌）

  func getRandomElementsFrom(array: [Int], count: Int = 8) -> [Int] {
    if array.count <= count {
        return array
    }

    var mapper = [Int: Int]()
    var results = [Int]()

    for i in 0..<count {
        let randomIndex = Int.random(in: 0..<array.count - i)

        if let existing = mapper[randomIndex] {
            results.append(array[existing])
        } else {
            let element = array[randomIndex]
            results.append(element)
        }

        let targetIndex = array.count - 1 - i
        mapper[randomIndex] = mapper[targetIndex] ?? targetIndex 
    }

    return results
}

score -1 · Accepted Answer

以下方法返回从参数列表列表中获取的 Min(n, list.size()) 随机元素的新列表。请记住，每次通话后都会修改列表列表。因此，每次调用都将“消耗”原始列表，从中返回n 个随机元素：

public static <T> List<T> nextRandomN(List<T> list, int n) {
  return Stream
    .generate(() -> list.remove((int) (list.size() * Math.random())))
    .limit(Math.min(list.size(), n))
    .collect(Collectors.toList());
}

示例用法：

List<Integer> list = new ArrayList<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));

System.out.println(nextRandomN(list, 3).toString());
System.out.println(nextRandomN(list, 3).toString());
System.out.println(nextRandomN(list, 3).toString());
System.out.println(nextRandomN(list, 3).toString());

样本输出：

[8, 2, 3]
[4, 10, 7]
[1, 5, 9]
[6]

java - 从 List 中取 n 个随机元素?

12 回答 12

Related

Reference