python - TypeError: Population 必须是一个序列或集合。对于 dicts，使用 list(d)

Question

我将两个文本文件导入为：

first_names = set(map(str.strip, open('first_names.all.txt')))
last_names = set(map(str.strip, open('last_names.all.txt')))

这些只是 1 列文本文件，如下所示：

--------------------
a'isha
a'ishah
a-jay
aa'isha
aa'ishah
aaban

打印类型：

print(type(first_names))

print(type(last_names))

<class 'set'>
<class 'set'>

然后我尝试创建一个包含 first_name、last_name 的 5,000 个笛卡尔积的样本

random.sample(itertools.product(first_names, last_names), 5000)

但我得到错误：

TypeError: Population must be a sequence or set.  For dicts, use list(d).

score 2 · Accepted Answer

sample不能在大多数迭代器对象上工作——它需要一个序列或一个集合。但是将其product转换为列表或集合会占用大量内存。或者，由于您已经阅读了两个sets的名称，请choice在每个集合上分别使用 5,000 次，而不是使用product：

names = [(random.choice(first_names), random.choice(last_names)) for _ in range(5000)]

注意：这有可能重复对的陷阱，而product.

克服这个问题的一种方法是将样本添加到一个处理重复的集合中，并继续添加直到达到所需的数量：

names = set()
while len(names) != 5000:
    names.add(tuple(random.sample(first_names, k=1) + random.sample(last_names, k=1)))

警告：从 Python 3.9random.sample()开始，不再适用于集合：

3.9 版后已弃用：将来，人口必须是一个序列。不再支持set的实例。必须首先将集合转换为list或tuple，最好以确定的顺序转换，以便样本可重现。

score 1 · Accepted Answer

您不能将 random.sample 直接应用于 itertools.product 对象。试试这个，处理一组它：

p=set(itertools.product(first_names, last_names))
random.sample(p, 5000)

2 回答 2