0

我必须列出,其中有一些共同的元素:

p = [('link1/d/b/c', 'target1/d/b/c'), ('link2/a/g/c', 'target2/a/g/c'), ..., ('linkn/b/b/f', 'targetn/b/b/f')]

q = [['target1/d/b/c', 'target1', 123, 334], ['targetn/b/b/f', 'targetn', 23, 64], ... ,['targetx/f/f/f', 'targetx', 999, 888]]

我试图比较它们并找到共同的元素,然后对结果做一些工作:

do_job('target1/d/b/c', 'target1', 123, 334, 'link1/d/b/c')

现在我使用简单且非常慢的算法:

for item in p:
   link = item[0]
   target = item[1]
   for item2 in q:
       target2 = item2[0]
       if target2 == target:
           do_some_job(...)

我认为,我需要比较这两个列表并创建一个包含所有元素的列表,例如:

pq = [['target1/d/b/c', 'target1', 123, 334, 'link1/d/b/c'], ..., ['targetn/b/b/f', 'targetn', 23, 64, 'linkn/b/b/f']]

然后do_some_job(pq)每次找到相同元素时调用而不是调用它

如何获得它?

此致

4

3 回答 3

5

用于chain()展平两个列表,然后使用set()andintersection()来获取共同的元素。

In [78]: from itertools import chain

In [79]: p
Out[79]: 
[('link1/d/b/c', 'target1/d/b/c'),
 ('link2/a/g/c', 'target2/a/g/c'),
 ('linkn/b/b/f', 'targetn/b/b/f')]

In [80]: q
Out[80]: 
[['target1/d/b/c', 'target1', 123, 334],
 ['targetn/b/b/f', 'targetn', 23, 64],
 ['targetx/f/f/f', 'targetx', 999, 888]]

In [81]: set(chain(*p)).intersection(set(chain(*q)))
Out[81]: set(['target1/d/b/c', 'targetn/b/b/f'])

或使用带有短路的列表理解:

In [86]: [j for i in p for j in i if j in (z for y in q for z in y)]
Out[86]: ['target1/d/b/c', 'targetn/b/b/f']

或使用any()

In [87]: [j for i in p for j in i if any (j==z for y in q for z in y)]
Out[87]: ['target1/d/b/c', 'targetn/b/b/f']

时间

In [93]: %timeit set(chain(*p)).intersection(set(chain(*q)))
100000 loops, best of 3: 7.38 us per loop                     ##  winner

In [94]: %timeit [j for i in p for j in i if j in (z for y in q for z in y)]
10000 loops, best of 3: 24.9 us per loop

In [95]: %timeit [j for i in p for j in i if any (j==z for y in q for z in y)]
10000 loops, best of 3: 27.4 us per loop

In [97]: %timeit [x for x in chain(*p) if x in chain(*q)]
10000 loops, best of 3: 12.6 us per loop
于 2012-10-23T10:20:35.413 回答
1

您可能应该使用字典:

target_to_link = dict((v,k) for (k,v) in p)
for item in q:
    args = item + [target_to_link[item[0]]
    do_some_job(*args)

target_to_link字典为您提供目标中的相应链接。只要确保您没有多个目标共享同一个链接...

for循环中,我们只是创建一个临时参数列表args,将您的item(例如,['target1/d/b/c', 'target1', 123, 334])与相应的链接结合起来,我们使用function(*args)语法...


如果您需要循环播放p,您可以构建一个字典,例如

target_to_args = dict((k[0],k[1:]) for k in q)

然后做类似的事情

for (link, target) in p:
    args = [target] + target_to_args[target] + [link]
    do_some_job(*args)
于 2012-10-23T10:20:39.737 回答
0

列表理解chain应该有效:

[x for x in chain(*p) if x in chain(*q)]
于 2012-10-23T10:29:13.463 回答