python - 从 itertools 随机化链

Question

我正在从python docs复制一个示例。

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

我们如何随机化我们得到的值的顺序，同时powerset保持惰性评估的结果？

编辑：我想要它的原因是我想计算派生集的总和，并在找到两个具有相同总和的集合时立即停止。如果我没记错的话，问题是 NP-complete。

score 2 · Accepted Answer

这是另一个想法：存储组合生成器并随机生成，直到你消耗所有。这也使集合大小的顺序随机化。

编辑：我假设您不关心单个集合中元素的顺序，因为您将对它们求和。如果你这样做了，你可以random.shuffle(next_value)在 yield 之前放一个。

import itertools
import random

def random_powerset(l):
    combs = [itertools.combinations(l,i) for i in range(len(l)+1)]
    while combs:
        comb_index = random.choice(range(len(combs)))
        try:
            next_value = next(combs[comb_index])
            yield next_value
        except StopIteration:
            combs.pop(comb_index)

输出：

In : list(random_powerset(range(3)))
Out: [(0, 1), (0, 2), (0, 1, 2), (1, 2), (), (0,), (1,), (2,)]

In : list(random_powerset(range(3)))
Out: [(0, 1, 2), (0,), (), (0, 1), (1,), (0, 2), (1, 2), (2,)]

In : list(random_powerset(range(3)))
Out: [(0, 1), (0, 1, 2), (0, 2), (), (0,), (1,), (1, 2), (2,)]

In : list(random_powerset(range(3)))
Out: [(), (0,), (0, 1), (0, 1, 2), (1,), (0, 2), (2,), (1, 2)]

In : list(random_powerset(range(3)))
Out: [(), (0, 1), (0,), (0, 1, 2), (1,), (0, 2), (2,), (1, 2)]

In : list(random_powerset(range(3)))
Out: [(0, 1), (0,), (0, 2), (1, 2), (), (1,), (2,), (0, 1, 2)]

In : list(random_powerset(range(3)))
Out: [(), (0, 1, 2), (0,), (1,), (2,), (0, 1), (0, 2), (1, 2)]

score 2 · Accepted Answer

itertools.combinations()以输入的顺序为我们提供结果。鉴于此，我们可以打乱我们的输入列表以产生随机顺序的元素（显然，结果的可能顺序会少得多）。

def random_powerset(iterable):
     s = list(iterable)
     lengths = list(range(len(s)+1))
     shuffle(lengths)
     return chain.from_iterable(combinations(s, r) for r in lengths if not shuffle(s))

（这有点丑陋 - 我们知道shuffle(s)将永远返回False，所以我们可以将它添加为一个条件，以确保它在每次调用时运行combinations()。）

我们预先生成了长度列表，以便我们也可以对其进行洗牌。

它不是完全随机的（仍然会有一个顺序 - 例如，所有长度为 n 的元素将聚集在一起，并且这些元素的顺序取决于输入的随机顺序），但会有相当数量随机性，如果这对你来说足够了。

示例输出：

>>> list(random_powerset(range(3)))
[(), (2,), (0,), (1,), (2, 1), (2, 0), (1, 0), (1, 2, 0)]
>>> list(random_powerset(range(3)))
[(), (0, 1), (0, 2), (1, 2), (0, 1, 2), (2,), (0,), (1,)]
>>> list(random_powerset(range(3)))
[(0, 1, 2), (2,), (1,), (0,), (0, 2), (0, 1), (2, 1), ()]
>>> list(random_powerset(range(3)))
[(1, 2, 0), (0,), (2,), (1,), (), (0, 1), (0, 2), (1, 2)]
>>> list(random_powerset(range(3)))
[(), (2, 1), (2, 0), (1, 0), (0,), (2,), (1,), (2, 1, 0)]
>>> list(random_powerset(range(3)))
[(1, 0), (1, 2), (0, 2), (0, 2, 1), (), (1,), (0,), (2,)]

我认为这是你能做的最好的事情，而不是让它变得不懒惰。

score 2 · Accepted Answer

这是一个懒惰且随机的解决方案：

import random

def powerset(seq):
    n = 2**len(seq)
    used = set([])
    while len(used) < n:
        choice = random.randint(0, n - 1)
        if not (choice in used):
            used.add(choice)
            binary = bin(choice)[2:].zfill(len(seq))
            yield [i[1] for i in zip(binary, seq) if i[0] == '1']
            #or following line if > python 2.7:
            #yield itertools.compress(seq, binary)

print list(powerset([1,2,3]))
print list(powerset([1,2,3]))
#output:
[[3], [1], [2, 3], [], [1, 2], [2], [1, 3], [1, 2, 3]]
[[2, 3], [1, 3], [1], [1, 2, 3], [1, 2], [2], [3], []]

如果你考虑[1, 2, 3]二进制的组合：

每个组合都可以用唯一的二进制标识符进行标记。而且总是有2**len(seq)组合....所以：

随机选择一个介于、0和之间的整数2**len(seq) - 1。
检查我们以前没有使用过它（如果有，请再次绘制）。
将其转换为二进制。
用 .zip 压缩它seq。
如果压缩的二进制数字是'0'我们将其从输出中排除。

这是懒惰的，适用于大型seq.

小警告：

可能有问题，但对您来说可能无关紧要。在序列快结束时，您可能会遇到重复重绘的麻烦（这可能会花费一些时间）。由于抽到已抽到的号码的概率为number of successful draws / 2**len(seq)，因此在给定抽签中，g，找到未使用的新号码的预期抽签次数为：

n / (n - g)
#where n = 2**len(seq)

这很好，前提是：n很小，或者很大n：（g << n这两种情况中的一种或两种都很可能，所以两者都不是什么大问题）。事实上，使用 large时，n您可以完全省去used重复检查，因为在第一次重复之前的预期迭代次数接近n**0.5.

score 1 · Accepted Answer

如果您超出以下范围，则可以在某种程度上改进 Lattyware 的解决方案itertools.chain：

def chain_random(iterables):
    iterables = list(iterables)
    icount = len(iterables)
    if icount == 0: return 
    while icount > 1:
        shuffle(iterables)
        try:
            yield iterables[-1].next()
        except StopIteration:
            iterables.pop()
            icount -= 1
    for element in iterables[0]:
        yield element

def random_powerset(iterable):
    s = list(iterable)
    lengths = list(range(len(s)+1))
    shuffle(lengths)
    return chain_random(combinations(s, r) for r in lengths if not shuffle(s))

示例输出：

>>> list(random_powerset(range(3)))
[(), (2, 1, 0), (1, 0), (1, 2), (2,), (0, 2), (1,), (0,)]
>>> list(random_powerset(range(3)))
[(1, 0), (1, 2), (0, 2, 1), (2,), (), (0, 2), (0,), (1,)]
>>> list(random_powerset(range(3)))
[(0, 1), (), (0, 2), (0,), (1, 2), (2, 0, 1), (1,), (2,)]
>>> list(random_powerset(range(3)))
[(), (1, 2), (2,), (1, 0), (0,), (2, 0), (1,), (1, 0, 2)]
>>> list(random_powerset(range(3)))
[(0, 1), (), (2,), (0, 2), (1, 2), (1,), (1, 2, 0), (0,)]
>>> list(random_powerset(range(3)))
[(0, 2, 1), (0,), (), (2, 0), (1,), (2, 1), (2,), (0, 1)]

itertools是用 C 编写的，所以chain_random会比itertools.chain. 但是这样你会得到更多的随机化。

python - 从 itertools 随机化链

4 回答 4

Related

Reference