1

假设我有两个人员列表,persons_a并且persons_b。我想尝试根据任意属性将列表中的每个人persons_a与一个人匹配persons_b,例如person.ageperson.town_from左右。

我怎样才能以最有效的方式在 Python 中做到这一点?我只是做一个for循环吗?

criteria = lambda a, b: a.age == b.age

result = []
for a in persons_a:
    for b in persons_b:
        if critera(a, b):
           result.add(a)
4

5 回答 5

5
criteria = lambda a, b: a.age == b.age
cross = itertools.product( persons_a, persons_b )
result = ( a for a, b in cross if criteria( a, b ) )

这更 Pythonic 并且更易于阅读。这itertools只是一种执行相同嵌套 for 循环的方法,因此它不会更有效,只是更容易阅读代码。

由于您必须遍历每个组合,因此您将无法获得比您所拥有的更好的结果O( n^2 ),因此除非您可以使循环短路或想出一个通过两个列表的贪婪算法的单次遍历,然后上面和您的是最优解。如果您有半结构化数据,因为有相同长度的列表也已排序,那么您可以通过一次通过列表来加速您的代码,但如果您没有任何像这样的结构,那么你将不得不坚持你的O( n^2 )算法。

于 2012-11-19T15:38:15.343 回答
3

通过引用字典方法,您的意思是否类似于以下内容?

class Store() :

    def __init__(self,types):
        self.a = {}
        for i in types : self.a[i]={}

    def addToStore(self,item) : # item is a dictionary
        for key in item.keys() :
            if key in self.a :
                self.a[key][item[key]] =   self.a[key].setdefault(item[key],[])+[item]

    def printtype(self,atype) :
        print atype
        for i in self.a[atype] : print self.a[atype][i]

if __name__=="__main__" :
    persons= Store(["age","place"])
    persons.addToStore({"name" : "Smith" , "place" : "Bury" , "age" : 32 })
    persons.addToStore({"name" : "Jones" , "place" : "Bolton" , "age" : 35 })
    persons.addToStore({"name" : "Swift" , "place" : "Radcliffe" , "age" : 32 })
    persons.addToStore({"name" : "Issac" , "place" : "Rochdale" , "age" : 32 })
    persons.addToStore({"name" : "Phillips" , "place" : "Bolton" , "age" : 26 })
    persons.addToStore({"name" : "Smith" , "place" : "Bury" , "age" : 41 })
    persons.addToStore({"name" : "Smith" , "place" : "Ramsbottom" , "age" : 25 })
    persons.addToStore({"name" : "Wilson" , "place" : "Bolton" , "age" : 26 })
    persons.addToStore({"name" : "Jones" , "place" : "Heywood" , "age" : 72 })
    persons.printtype("age")
    persons.printtype("place")
于 2012-11-19T21:00:36.020 回答
3

使用嵌套的 for 循环,如问题和一些先前的答案,给出了一个 O(m*n) 算法,其中 m, n 是列表ab的大小。(或者,如果列表大小相同,则为 O(n^2)。)要获得 O(n) 或 O(n log n) 算法,您可以 (1) 使用支持 O 的集合或字典数据结构(1) 或 O(log n) 成员资格查找或可以 (2)按标准元素对a和/或b进行排序,以允许 O(1) 或 O(log n) 匹配测试。请参阅下面的示例代码,它使用两种需要 O(n log n) 时间的排序,然后是一个 O(n) 比较通道,用于识别匹配对,总时间为 O(n log n)。

import collections
iTuple = collections.namedtuple('item',['data','age'])
p_a, p_b, n, p = [], [], 15, 19

for i in range(n):
   p_a.append(iTuple(i, ( 7*i)%p))
   p_b.append(iTuple(i, (11*i)%p))

sa = sorted(p_a, key=lambda x: x.age)
sb = sorted(p_b, key=lambda x: x.age)
#print ('sa: ', sa, '\n\nsb: ',sb, '\n')

ia, ib, result = 0, 0, []
while ia < n > ib:
   #print (ia, ib)
   if sa[ia].age == sb[ib].age:
      result.append([sa[ia], sb[ib]])
      print ('Match:', sa[ia], '\t', sb[ib],'\tat',ia,ib)
      ia, ib = ia+1, ib+1
   elif sa[ia].age  < sb[ib].age: ia += 1
   elif sa[ia].age >  sb[ib].age: ib += 1

#print ('Result:', result)

这是上述程序的输出。

Match: item(data=0, age=0)   item(data=0, age=0)    at 0 0
Match: item(data=11, age=1)  item(data=7, age=1)    at 1 1
Match: item(data=3, age=2)   item(data=14, age=2)   at 2 2
Match: item(data=14, age=3)  item(data=2, age=3)    at 3 3
Match: item(data=6, age=4)   item(data=9, age=4)    at 4 4
Match: item(data=9, age=6)   item(data=4, age=6)    at 5 5
Match: item(data=1, age=7)   item(data=11, age=7)   at 6 6
Match: item(data=4, age=9)   item(data=6, age=9)    at 8 7
Match: item(data=7, age=11)  item(data=1, age=11)   at 9 9
Match: item(data=2, age=14)  item(data=3, age=14)   at 11 11
Match: item(data=13, age=15) item(data=10, age=15)  at 12 12
Match: item(data=8, age=18)  item(data=12, age=18)  at 14 14
于 2012-11-19T16:07:29.070 回答
1

假设您能够根据该自定义标准对人员进行排序:

使用itertools.groupby构建一个列表的字典(应该O(n log n用于排序),然后找到另一个非常有效的(精确)匹配项(O(m)对于另一个列表中的每个人来说都是恒定的。)。


这是一个说明性的实现:

import random
import collections
import itertools
iTuple = collections.namedtuple('Person', ['town', 'age'])

# make up data
random.seed(1)
def random_person():
    age = random.randrange(19,49)
    town = random.choice("Edinburgh Glasgow Aberdeen".split())
    return iTuple(town, age)
n_f, n_m = 15, 20
females = [random_person() for x in xrange(n_f)]
males = [random_person() for x in xrange(n_m)]

# group by criterion of interest: age, town
by_age, by_town = lambda x: x.age, lambda x: x.town
males_by_age = dict((age, list(group)) for age, group in itertools.groupby(
        sorted(males, key=by_age), key=by_age))
males_by_town = dict((age, list(group)) for age, group in itertools.groupby(
        sorted(males, key=by_town), key=by_town))

然后您可以查询此字典以获取匹配列表:

# assign random matches according to grouping variable (if available)
print "matches by age:"
for person in females:
    candidates = males_by_age.get(person.age)
    if candidates:
        print person, random.choice(candidates)
    else:
        print person, "no match found"

print "matches by town:"
for person in females:
    candidates = males_by_town.get(person.town)
    if candidates:
        print person, random.choice(candidates)
    else:
        print person, "no match found"

输出类似于:

matches by age:
Person(town='Aberdeen', age=23) no match found
Person(town='Edinburgh', age=41) no match found
Person(town='Glasgow', age=33) no match found
Person(town='Aberdeen', age=38) Person(town='Edinburgh', age=38)
Person(town='Edinburgh', age=21) no match found
Person(town='Glasgow', age=44) Person(town='Glasgow', age=44)
Person(town='Edinburgh', age=41) no match found
Person(town='Aberdeen', age=32) no match found
Person(town='Aberdeen', age=25) Person(town='Edinburgh', age=25)
Person(town='Edinburgh', age=46) no match found
Person(town='Glasgow', age=19) no match found
Person(town='Glasgow', age=47) Person(town='Glasgow', age=47)
Person(town='Glasgow', age=25) Person(town='Glasgow', age=25)
Person(town='Edinburgh', age=19) no match found
Person(town='Glasgow', age=32) no match found
matches by town:
Person(town='Aberdeen', age=23) Person(town='Aberdeen', age=45)
Person(town='Edinburgh', age=41) Person(town='Edinburgh', age=27)
Person(town='Glasgow', age=33) Person(town='Glasgow', age=44)
Person(town='Aberdeen', age=38) Person(town='Aberdeen', age=45)
Person(town='Edinburgh', age=21) Person(town='Edinburgh', age=20)
Person(town='Glasgow', age=44) Person(town='Glasgow', age=24)
Person(town='Edinburgh', age=41) Person(town='Edinburgh', age=38)
Person(town='Aberdeen', age=32) Person(town='Aberdeen', age=34)
Person(town='Aberdeen', age=25) Person(town='Aberdeen', age=40)
Person(town='Edinburgh', age=46) Person(town='Edinburgh', age=38)
Person(town='Glasgow', age=19) Person(town='Glasgow', age=34)
Person(town='Glasgow', age=47) Person(town='Glasgow', age=42)
Person(town='Glasgow', age=25) Person(town='Glasgow', age=34)
Person(town='Edinburgh', age=19) Person(town='Edinburgh', age=27)
Person(town='Glasgow', age=32) Person(town='Glasgow', age=34)
于 2012-11-19T22:00:03.713 回答
1
result = [a for a in persons_a for b in persons_b if critera(a, b)]

列表理解形式的循环。

根据您计划对结果执行的操作,您也可以使用看起来几乎相同的生成器表达式:

result = (a for a in persons_a for b in persons_b if critera(a, b))

不同的是它不占用内存。相反,它会在请求时生成值,就像yield在生成器函数中一样。

于 2012-11-19T15:48:53.247 回答