我有两个列表,A 和 B,具有相同数量的元素,尽管每个列表中的元素不一定不同。
我想通过随机耦合来自 A 和 B 的元素来形成一个新列表(随机配对很重要)。
但是,我还需要确保结果列表中的每一对都是唯一的。
到目前为止,我一直在解决以下问题,它适用于小型列表,但显然不适合具有多种组合的大型列表。
from random import shuffle
# Create a list of actors and events for testing
events = ['P1','P1','P1','P2','P2','P2','P3','P3','P3','P4','P5','P6','P7','P7']
actors = ['IE','IE','ID','ID','IA','IA','IA','IC','IB','IF','IG','IH','IH','IA']
# Randomize the elements of each list
shuffle(events)
shuffle(actors)
# Merge the two lists into a new list of pairs
edgelist = zip(events,actors)
# If the new list of pairs has all unique elements, then it is a good solution, otherwise try again at random
x = set(edgelist)
if len(edgelist) == len(x):
break
else:
while True:
shuffle(events)
shuffle(actors)
edgelist = zip(events,actors)
x = set(edgelist)
if len(edgelist) == len(x):
break
# Display the solution
print 'Solution obtained: '
for item in edgelist:
print item
任何人都可以提出一种修改或替代方法来扩展到更大的输入列表吗?
感谢您的帮助。
更新
事实证明,这是一个比最初想象的更具挑战性的问题。我想我现在有一个解决方案。它可能无法很好地扩展,但适用于中小型列表。它会在开始之前检查解决方案是否可行,因此不需要对输入列表的分布进行假设。我还包含了几行代码来显示结果列表的频率分布与原始列表匹配。
# Randomize the elements
shuffle(events)
# Make sure a solution is possible
combinations = len(set(events))*len(set(actors))
assert combinations >= len(events) and combinations >= len(actors) and len(events) == len(actors), 'No soluton possible!'
# Merge the two lists into a new list of pairs (this will contain duplicates)
edgelist = zip(events,actors)
# Search for duplicates
counts = collections.Counter(edgelist)
duplicates = [i for i in counts if counts[i] > 1]
duplicate_count = len(duplicates)
while duplicate_count > 0:
# Get a single duplicate to address
duplicate = duplicates[0]
# Find the position of the duplicate in the in edgelist
duplicate_pos = edgelist.index(duplicate)
# Search for a replacement
swap = choice(edgelist)
swap_pos = edgelist.index(swap)
if (swap[0],duplicate[1]) not in edgelist:
edgelist[duplicate_pos] = (swap[0],duplicate[1])
edgelist[swap_pos] = (duplicate[0],swap[1])
# Update duplicate count
counts = collections.Counter(edgelist)
duplicates = [i for i in counts if counts[i] > 1]
duplicate_count = len(duplicates)
# Verify resulting edgelist and frequency distributions
print 'Event Frequencies: '
print sorted([y for (x,y) in list(collections.Counter(events).items())], reverse=True)
print 'Edgelist Event Frequencies: '
print sorted([y for (x,y) in list(collections.Counter([x for (x,y) in edgelist]).items())], reverse=True)
print 'Actor Frequencies: '
print sorted([y for (x,y) in list(collections.Counter(actors).items())], reverse=True)
print 'Edgelist Actor Frequencies: '
print sorted([y for (x,y) in list(collections.Counter([y for (x,y) in edgelist]).items())], reverse=True)
assert len(set(edgelist)) == len(events) == len(actors)