4

我有大量成对的整数对。例如

pairs = [((3, 5), (5, 5)), ((1, 1), (2, 5)), ((5, 1), (4, 3))]

我还有一大堆整数对。例如,

nums = [(2, 5), (4, 2), (5, 2)]

我想从对中删除任何在 nums 中有任何对的对。例如,

nums = set(nums)
pairs = [((x1,y1),(x2,y2)) for ((x1,y1),(x2,y2)) in pairs if  not (set([(x1,y1),(x2,y2)]) & nums)]

这给

[((3, 5), (5, 5)), ((5, 1), (4, 3))]

问题是当pairs 和nums 很大时,这非常慢。你怎么能加快这个速度?

慢速输入示例:

import random
nums = [(random.randint(1,50000),random.randint(1,50000)) for i in xrange(1000000)]
pairs = [((random.randint(1,50000),random.randint(1,50000)), (random.randint(1,50000),random.randint(1,50000))) for i in xrange(8000000)]
4

3 回答 3

4

这将是最快的

nums=set(nums)
pairs= filter(nums.isdisjoint, pairs)

以下是时间:

In [1]: import random

In [2]: pairs=[(random.randint(0,50),random.randint(0,50)) for i in range (1000)]

In [3]: nums=[random.randint(0,1000) for i in range(500)]

In [4]: numset=set(nums)

In [5]: %timeit [(x,y) for (x,y) in pairs if not (set([x,y]) & numset)]
1000 loops, best of 3: 746 us per loop

In [6]: %timeit [(x,y) for (x,y) in pairs if x not in numset and y not in numset]
10000 loops, best of 3: 145 us per loop    

In [7]: %timeit filter(numset.isdisjoint, pairs)
10000 loops, best of 3: 95.1 us per loop
于 2013-11-12T12:44:55.170 回答
2

一个建议是制作numsa 集合,以便查找更快。

pairs = [(1,2),(2,3),(7,2)]
nums = {3,7} # Its a set now
print [(first, second) for first, second in pairs if first not in nums and second not in nums]

输出

[(1, 2)]
于 2013-11-12T12:41:24.350 回答
0

这应该是您可以衡量的一些解决方案

def rm_pairs(pairs, nums):
    cursor = 0
    while True:
        if cursor == len(pairs):
            break
        #print cursor
        p = pairs[cursor]
        for n in nums:
            if n in p:
                pairs.pop(cursor)
                cursor = 0
                break
        cursor += 1
于 2013-11-12T13:19:47.310 回答