6

我有一个dictionary包含 3lists键的 python '时间','电源'和'使用'。所有列表都具有相同数量的元素,并且所有列表都已排序。我想要做的是总结列表'power'和'usage'的所有元素,它们的索引对应于列表'time'中的相同值,以便每个时间单位只有一个功率和使用情况样本。

例如转换这个字典:

{'time': [1, 2, 2, 3, 4, 4, 5],
 'power': [2, 2, 3, 6, 3, 3, 2],
 'usage': [0, 1, 1, 2, 1, 4, 7]}

对此:

{'time': [1, 2, 3, 4, 5],
 'power': [2, 5, 6, 6, 2],
 'usage': [0, 2, 2, 5, 7]}

我已经编写了这段有效的代码,但我不太喜欢它:

d = {'time':[1,2,2,3,4,4,5], 'power':[0,1,1,2,1,4,7], 'usage':[2,2,3,6,3,3,2]}
prev = -1
new_d = {'time':[], 'power': [], 'usage':[]}
indexes =  range( len(d['time']) )

for i in indexes:
  if d['time'][i]!=prev:
    new_d['time'].append(d['time'][i])
    new_d['power'].append(d['power'][i])
    new_d['usage'].append(d['usage'][i])
  else:
    last_power = len( new_d['power'] ) - 1
    last_usage = len( new_d['usage'] ) - 1
    new_d['power'][last_power]+=d['power'][i]
    new_d['usage'][last_usage]+=d['usage'][i]
  prev=d['time'][i]

print d
print new_d

有没有一种pythonian方法可以更简单、更全面地做到这一点?

4

9 回答 9

3

一个强大的解决方案,可以处理任意数量的额外字段 - 按“时间”字段排序(作为一种方法):

def aggregate(old_d, sort_key='time'):
    new_d = dict((k, []) for k in old_d)
    prev = None
    curr = None
    for i in range(len(old_d[sort_key])):
        curr = old_d[sort_key][i]
        for key, lst in new_d.iteritems(): # .items() in Python 3+
            if prev == curr:
                if key != sort_key:           
                    lst[-1] += old_d[key][i]
            else:
                lst.append(old_d[key][i])
        prev = curr
    return new_d

使用你的字典:

d = {'time': [1, 2, 2, 3, 4, 4, 5],
     'power': [2, 2, 3, 6, 3, 3, 2],
     'usage': [0, 1, 1, 2, 1, 4, 7]}

print aggregate(d)
>>>
{'usage': [0, 2, 2, 5, 7], 'power': [2, 5, 6, 6, 2], 'time': [1, 2, 3, 4, 5]}
于 2013-04-10T13:03:38.583 回答
3

这是一个可以处理任意字典的......(d你的字典在哪里......)

from itertools import groupby, imap
from operator import itemgetter

def group_dict_by(mapping, field, agg=sum):
    grouper = mapping[field]
    new_grouper = []
    accum = {k: [] for k in mapping.viewkeys() - [field]}
    for key, grp in groupby(enumerate(grouper), itemgetter(1)):
        new_grouper.append(key)
        idx = [g[0] for g in grp]   
        for dk, dv in accum.iteritems():
            dv.append(agg(imap(mapping[dk].__getitem__, idx)))

    accum[field] = new_grouper
    return accum

print group_dict_by(d, 'time')
# {'usage': [0, 2, 2, 5, 7], 'power': [2, 5, 6, 6, 2], 'time': [1, 2, 3, 4, 5]}
于 2013-04-10T14:50:41.007 回答
2

使用itertools.groupby,zip和一些列表推导:

In [55]: dic={'time': [1, 2, 2, 3, 4, 4, 5],
   ....:  'power': [2, 2, 3, 6, 3, 3, 2],
   ....:  'usage': [0, 1, 1, 2, 1, 4, 7]}

In [56]: from itertools import groupby

In [57]: from operator import itemgetter

In [58]: zip1=zip(dic['time'],dic['power']) #use `itertools.izip` for performance    

In [59]: [sum(x[1] for x in v) for k,v in groupby(zip1,key=itemgetter(0))]
Out[59]: [2, 5, 6, 6, 2]

In [60]: zip2=zip(dic['time'],dic['usage'])

In [61]: [sum(x[1] for x in v) for k,v in groupby(zip2,key=itemgetter(0))]
Out[61]: [0, 2, 2, 5, 7]

In [64]: timee=[k for k,v in groupby(dic['time'])]

In [65]: timee
Out[65]: [1, 2, 3, 4, 5]

zip1[(1, 2), (2, 2), (2, 3), (3, 6), (4, 3), (4, 3), (5, 2)],现在您可以使用基于第一项的元素对元素进行分组itertools.groupby,然后取返回组中每个元组的第二个元素的总和。

In [75]: new_time=[k for k,v in groupby(dic['time'])]

In [76]: new_power=[sum(x[1] for x in v) for k,v in groupby(zip1,key=itemgetter(0))]

In [77]: new_usage=[sum(x[1] for x in v) for k,v in groupby(zip2,key=itemgetter(0))]

In [80]: dict(zip(('time','power','usage'),(new_time,new_power,new_usage)))
Out[80]: {'power': [2, 5, 6, 6, 2], 'time': [1, 2, 3, 4, 5], 'usage': [0, 2, 2, 5, 7]}
于 2013-04-10T13:14:09.627 回答
1

我首先将值分组在一个新的字典中,然后求和。占用更多空间,但它既简单又快速:

from collections import defaultdict
from itertools import groupby

power = defaultdict(list)
usage = defaultdict(list)

for i, time in enumerate(data['time']):
    power[time].append(data['power'][i])
    usage[time].append(data['usage'][i])

times = [key for key,group in groupby(data['time'])]

print {    'time': times,
           'power' : [sum(power[time]) for time in times],
           'usage' : [sum(usage[time]) for time in times]
       }
于 2013-04-10T13:15:43.923 回答
1

您可以对任意数量的额外字段使用以下方法:

from itertools import groupby
from operator import itemgetter

dic = {'time': [1, 2, 2, 3, 4, 4, 5],
 'power': [2, 2, 3, 6, 3, 3, 2],
 'usage': [0, 1, 1, 2, 1, 4, 7]}

aggrigated = {}
fields = dic.items()

for field in fields:
    aggrigated[field[0]] = [sum(y[1] for y in x)
                                for k,x in groupby(
                                    zip(fields[0][1], field[1]), 
                                    key=itemgetter(0))
                           ]

从 Ashwini Chaudhary 的答案中借用的改进版本。

于 2013-04-10T13:13:44.343 回答
1
>>> from itertools import groupby
>>> from operator import itemgetter
>>> d = {'usage': [0, 1, 1, 2, 1, 4, 7], 'power': [2, 2, 3, 6, 3, 3, 2], 'time': [1, 2, 2, 3, 4, 4, 5]}
>>> groups = groupby(zip(d['time'], d['power'], d['usage']), key=itemgetter(0))
>>> lists = zip(*[[k] + map(sum, zip(*g)[1:]) for k, g in groups])
>>> dict(zip(('time', 'power', 'usage'), lists))
{'usage': (0, 2, 2, 5, 7), 'power': (2, 5, 6, 6, 2), 'time': (1, 2, 3, 4, 5)}

对于可变数量的键,我添加了keys变量以避免重写它们:

>>> from itertools import groupby
>>> from operator import itemgetter
>>> keys = ('time', 'power', 'usage')
>>> groups = groupby(zip(*[d[k] for k in keys]), key=itemgetter(0))
>>> lists = zip(*[[k] + map(sum, zip(*g)[1:]) for k, g in groups])
>>> dict(zip(keys, lists))
{'usage': (0, 2, 2, 5, 7), 'power': (2, 5, 6, 6, 2), 'time': (1, 2, 3, 4, 5)}
于 2013-04-10T13:50:24.380 回答
1
from itertools import izip

def m_(time, power, usage):

    time_, power_, usage_ = [], [], []

    for t, p, u in izip(time, power, usage):

        if not time_:
            time_.append( t )
            power_.append( 0 )
            usage_.append( 0 )

        if time_[-1] == t:
            power_[-1] += p
            usage_[-1] += u
        else:
            time_.append( t )
            power_.append( p )
            usage_.append( u )

    time[:], power[:], usage[:] = time_, power_, usage_

if __name__ == '__main__':
    d = {'time':[1,2,2,3,4,4,5], 'power':[0,1,1,2,1,4,7], 'usage':[2,2,3,6,3,3,2]}
    m_(**d)
    print d
于 2013-04-10T13:56:16.763 回答
0

这是“pythonian方式”:):

d = {'time': [1, 2, 2, 3, 4, 4, 5],
 'power': [2, 2, 3, 6, 3, 3, 2],
 'usage': [0, 1, 1, 2, 1, 4, 7]}

new_d = {'time' : [], 'power' : [], 'usage' : []}

for time in set(d['time']):
    new_d['time'].append(time)
    new_d['power'].append(sum(value for index, value in enumerate(d['power']) if d['time'][index] == time)) 
    new_d['usage'].append(sum(value for index, value in enumerate(d['usage']) if d['time'][index] == time))

print new_d
于 2013-04-10T13:09:23.297 回答
0

以下是我的问题的精确解决方案。我是根据 jamylak 的答案做出的,我认为这是所有给定的最“pythonian”和最全面的解决方案。我所做的是调整他的代码以便处理多个字段,即字典中的多个列表。我已经接受了 jamylak 的答案,这里是多个领域的解决方案:

from itertools import groupby              
from operator import itemgetter

d = {'power': [2, 2, 3, 6, 3, 3, 2],
     'usage': [0, 1, 1, 2, 1, 4, 7],
     'time': [1, 2, 2, 3, 4, 4, 5]}

# construct a list with all the key names (starting from 'time')
keys = ['time'] + [key for key in d.keys() if key!='time']
# construct a list with all the keys' lists (starting from the one of 'time')
keys_lists = [ d['time'] ] + [d[key] for key in d.keys() if key!='time']
groups = groupby(zip(*keys_lists), key=itemgetter(0))
lists = zip(*[[k] + map(sum, zip(*g)[1:]) for k, g in groups])
new_d = dict(zip((keys), lists))
print new_d
于 2013-04-10T23:02:52.553 回答