3

我有一本包含相同长度(但类型不同)的数据的字典,例如:

data = {
    "id": [1,1,2,2,1,2,1,2], 
    "info": ["info1","info2","info3","info4","info5","info6","info7","info8"],       
    "number": [1,2,3,4,5,6,7,8]
}

现在我想把它分成两部分id,保留各自的infonumber。也就是说,有两个 dictsdata1data2.

注意:这只是一个示例,字典中有多个键,我想避免使用键名,而是遍历所有键名。

什么是 Pythonic 的做法?

4

4 回答 4

2

带有理解列表:

data1 = [ data["info"][idx] for idx, x in enumerate(data["id"]) if x == 1 ]
#data1 = ['info1', 'info2', 'info5', 'info7']

如果要恢复所有密钥:

data1 = [ { key : data[key][idx] for key in data.keys() }  for idx, x in enu
merate(data["id"]) if x == 1 ]
>>> data1
[{'info': 'info1', 'id': 1, 'number': 1}, {'info': 'info2', 'id': 1, 'number': 2
}, {'info': 'info5', 'id': 1, 'number': 5}, {'info': 'info7', 'id': 1, 'number':
 7}]
于 2013-04-19T07:42:23.610 回答
1

对于使用记录,我个人喜欢numpy.recarray.

In [3]: import numpy as np
In [4]: fields = data.keys()
In [8]: recs = zip(*[ lst for k, lst in data.iteritems() ])

In [9]: recs[0]
Out[9]: ('info1', 1, 1)
In [10]: recs[1]
Out[10]: ('info2', 1, 2)

In [21]: ra = np.rec.fromrecords(recs, names = fields )
In [17]: ra
rec.array([('info1', 1, 1), ('info2', 1, 2), ('info3', 2, 3), ('info4', 2, 4),
       ('info5', 1, 5), ('info6', 2, 6), ('info7', 1, 7), ('info8', 2, 8)], 
      dtype=[('info', 'S5'), ('id', '<i8'), ('number', '<i8')])

In [23]: ra[ra.id == 2]
rec.array([('info3', 2, 3), ('info4', 2, 4), ('info6', 2, 6), ('info8', 2, 8)], 
      dtype=[('info', 'S5'), ('id', '<i8'), ('number', '<i8')])

In [24]: ra[ra.id == 2].number
Out[24]: array([3, 4, 6, 8])

In [25]: ra[ra.id == 2][0]
Out[25]: ('info3', 2, 3)

In [26]: ra[ra.id == 2][0].number
Out[26]: 3

如果要在字典中按 id 对记录进行分组,请执行以下操作:

{ id: ra[ra.id == id] for id in set(ra.id) }
于 2013-04-19T08:04:31.393 回答
0
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for ID,info in zip(data["id"],data["info"]):
    res[ID].append(info)


>>> res
defaultdict(<type 'list'>, {1: ['info1', 'info2', 'info5', 'info7'], 2: ['info3', 'info4', 'info6', 'info8']})
>>> 
于 2013-04-19T07:38:51.800 回答
0
from collections import defaultdict

ids = data.pop('id')
databyid = defaultdict(lambda: defaultdict(list))

for id, values in zip(ids, zip(*data.values())):
    for kid, kval in enumerate(data.keys()):
        databyid[id][kval].append(values[kid])

如果您需要原始状态的数据(带有 id):

 data['id'] = ids

结果:

>>> databyid[1]
defaultdict(<type 'list'>, {'info': ['info1', 'info2', 'info5', 'info7'], 'number': [1, 2, 5, 7]})
>>> databyid[2]
defaultdict(<type 'list'>, {'info': ['info3', 'info4', 'info6', 'info8'], 'number': [3, 4, 6, 8]})
>>> 
于 2013-04-19T09:20:17.637 回答