0

我有一个看起来像这样的字典列表:

[{TYPE, OBJECT_ID, ACTOR, EXTRA_FIELDS}, ...]   

我想通过并聚合 {TYPE, OBJECT_ID} 的重复项并将 ACTOR 设为一个列表:

从...开始:

   [ {'type': 'LOVE', 'obj_id': 1242, 'actor': 'bob', {..}}, 
      {'type': 'LOVE', 'obj_id': 1242, 'actor': 'dave', {..}}, 
      {'type': 'FAV', 'obj_id': 1242, 'actor': 'sam', {..}}, 
      {'type': 'LOVE', 'obj_id': 242, 'actor': 'bob', {..}}]

以结束:

   [ {'type': 'LOVE', 'obj_id': 1242, 'actor': ['bob', 'dave'], {..}}, 
      {'type': 'FAV', 'obj_id': 1242, 'actor': ['sam'], {...}}, 
      {'type': 'LOVE', 'obj_id': 242, 'actor': ['bob'], {...}} ]

EXTRA_FIELDS 不必合并,它们可以只使用来自聚合项目之一的数据。

我怎样才能在python中做到这一点?

4

5 回答 5

0

假设这input是元组列表(不是集合),那么

TYPE= 0
OBJECT_ID= 1
ACTOR= 2
EXTRA_INFO= 3
keys= set( [ ( e[TYPE] , e[OBJECT_ID] ) for e in input ] )
output= { k: [ ( e[ACTOR] , e[EXTRA_INFO] ) for e in input if ( e[TYPE] , e[OBJECT_ID] ) == k ] for k in keys }

或者,如果你喜欢单行:

output= { k: [ ( e[2] , e[3] ) for e in input if ( e[0] , e[1] ) == k ] for k in [ ( e[0] , e[1] ) for e in input ] }

假设这input是一个字典列表,则变为:

keys= set( [ ( e['type'] , e['obj_id'] ) for e in input ] )
output= { k: [ { 'actor':e['actor'] , 'extra_info':e['extra_info'] } for e in input if ( e['type'] , e['obj_id'] ) == k ] for k in keys }

或者,

output= { k: [ { 'actor':e['actor'] , 'extra_info':e['extra_info'] } for e in input if ( e['type'] , e['obj_id'] ) == k ] for k in [ ( e['type'] , e['obj_id'] ) for e in input ] }

当然,您也可以手动编写这些推导式的作用,但我不建议您这样做,除非数据量太大并且您开始遇到需要低级优化的性能问题。

于 2013-08-10T01:05:19.467 回答
0

你的清单我表示为alist

actors = {}
extra = {}
for x in alist:
   if actors.has_key([(x['type'],x['obj_id'])):
      actors[x['type'],x['obj_id']].append(x['actor'])
   else:
      actors[x['type'],x['obj_id']] = []
   extra[x['type'],x['obj_id']] = x['extra']

outlist = []
for k in actors.keys():
   x = {}
   x['type'], x['obj_id'], x['actor'], x['extra'] = k[0], k[1], actors[k], extra[k]
   outlist.append(x)

outlist是输出列表。

于 2013-08-10T01:08:59.960 回答
0

您应该将问题分解为其组成部分。

您需要做的第一件事是将所有这些演员更改为列表:

for dict in list_of_dicts:
    dict['actor'] = [dict['actor']]

然后您需要编写一个方法来检查特定对是否在字典列表中,如果是则返回索引:

def check_pair(list_of_dicts,type,obj_id):
    #return index of matching pair, None otherwise
    index = -1
    for dict in list_of_dicts:
    index += 1
        if dict['type'] == type and dict['obj_id'] == obj_id:
        return index
    else:
        return None

然后,您需要创建一个新列表(以存储新数据)并遍历旧列表,或者将其附加到新列表中,或者如果 obj_id 和类型已经存在,则将 actor 附加到该字典。

new_list = []
for dict in list_of_dicts:
    j = check_pair(new_list,dict['type'],dict['obj_id'])
if j == None:
    new_list.append(dict)
else:
    new_list[j]['actor'].append(dict['actor'])

我应该指出,拥有这样的 dicts 列表是非常规的东西,你真的应该想办法让你的数据结构更明智。

于 2013-08-10T01:23:47.613 回答
0

这是我的做法:

def merge_dicts(list_of_dicts):
    lookup = {}
    results = []
    for d in list_of_dicts:
        key = (d['type'], d['obj_id'])
        try: # it's easier to ask forgiveness than permission
            lookup[key]['actor'].append(d['actor'])
        except KeyError:
            val = {'type': d['type'],
                   'obj_id': d['obj_id'],
                   'actor': [d['actor']], # note, extra [] around value to make it a list
                   'extra_fields': d['extra_fields']}
            lookup[key] = val
            results.append(val)

    return results

lookupdict 从键值的元组映射到已包含在结果列表中的字典。actor如果稍后遇到具有相同键的其他字典,则这些输出字典的值将发生变化。

一个更自然的解决方案是摆脱字典列表数据结构,而是使用从type, obj_id键映射到actors, extra_fields值的单个字典。这就是它的样子:

def merge_dicts2(list_of_dicts):
    results = {}
    for d in list_of_dicts:
        key = (d['type'], d['obj_id'])
        try:
            results[key][0].append(d['actor'])
        except KeyError:
            results[key] = ([d['actor']], d['extra_fields'])

    return results

这包含您的 dicts 列表中的大部分数据,只有订单丢失了(并且由于您正在合并旧列表中的项目,因此无论如何都会丢失一些订单)。

如果您稍后要对集合进行迭代,这种方式会容易得多,因为您可以在循环中解压缩元组(甚至是嵌套的元组):

combined_dict = merge_dicts(list_of_dicts)

for (type, obj_id), (actors, extra_fields) in combined_dict.items():
    # do stuff with type, obj_id, actors, extra_fields
于 2013-08-10T02:34:53.217 回答
-2

一种解决方案是:首先,获取一组标识符(一组类型和obj_id的唯一组合);然后,获取每个组合的演员列表。

identifiers = set((item['type'], item['obj_id']) for item in input_list)
output_list = []
for type, obj_id in identifiers:
    output_list.append({
        'type': type,
        'obj_id': obj_id,
        'actor': [item['actor'] for item in input_list
            if item['type'] is type and item['obj_id'] is obj_id]
        })

或者,使用元组作为字典键:

actors_dict = {}
for item in input_list:
    actors_dict.setdefault((item['type'], item['obj_id']), []).append(item['actor'])
output_list = [{'type': type, 'obj_id': obj_id, 'actor': actors}
    for (type, obj_id), actors in actors_dict.iteritems()]    

或者更灵活的编写方式(例如,如果您添加要合并的其他值)将是:

output_dict = {}
for item in input_list:
    k = item['type'], item['obj_id']
    if k in output_dict:
        output_dict[k]['actor'].append(item['actor'])
    else:
        item['actor'] = [item['actor']]
        output_dict[k] = item
output_list = output_dict.values()

(请注意,最后一个也会改变输入列表。)

于 2013-08-10T01:46:16.873 回答