python - Python list_of_tuples: sum second val of each tuple, only if first val of tuple == something

Question

I have a list of "tagged" tuples...where each tuple is (tag_id, value)...like so:

my_list = [(tag_A, 100), (tag_A, 200), (tag_A, 300), (tag_A, 400), (tag_B, 400), (tag_B, 600)]

I want to sum the values of each tuple with the same tag...so that:

sum_of_all_values_with_tag_A() = 1000

sum_of_all_values_with_tag_B() = 1000

I can't figure out a simple Pythonic way of doing that.

sum(set(value for tag_id, value in my_list))

...returns the sum of ALL the values.

I suppose I can wrap that with a for or a while loop, so that only tuples with the tag I want to sum are touched by that expression...? I need to sum the values associated with both tags...resulting in two different totals, differentiated as above. But can't quite grok an elegant syntax for such a thing.

This is happening inside of a pre-existing function. Would be great to do it without nesting functions.

Any suggestions are appreciated!

score 9 · Accepted Answer

Use a generator expression to sum per tag:

sum(val for tag, val in my_list if tag == tag_A)

You could sort on the tags then use itertools.groupby to create per-tag groups and sums:

from itertools import groupby
from operator import itemgetter

key = itemgetter(0)  # tag
sums = {tag: sum(tup[1] for tup in group)
        for tag, group in groupby(sorted(my_list, key=key), key=key)}

This would produce a dictionary mapping tags to per-tag sum:

>>> from itertools import groupby
>>> from operator import itemgetter
>>> tag_A, tag_B = 'A', 'B'
>>> my_list = [(tag_A, 100), (tag_A, 200), (tag_A, 300), (tag_A, 400), (tag_B, 400), (tag_B, 600)]
>>> key = itemgetter(0)  # tag
>>> sums = {tag: sum(tup[1] for tup in group)
...         for tag, group in groupby(sorted(my_list, key=key), key=key)}
>>> print sums
{'A': 1000, 'B': 1000}

score 6 · Accepted Answer

Approach

Put your data into a defaultdict(list). Summarize that.

Code

from collections import defaultdict
my_list = [('tag_A', 100), ('tag_A', 200), ('tag_A', 300), ('tag_A', 400), ('tag_B', 400), ('tag_B', 600)]

d = defaultdict(list)
for tag, num in my_list:
    d[tag].append(num)

Test

>>> from collections import defaultdict
>>> my_list = [('tag_A', 100), ('tag_A', 200), ('tag_A', 300), ('tag_A', 400), ('tag_B', 400), ('tag_B', 600)]
>>> 
>>> d = defaultdict(list)
>>> for tag, num in my_list:
...     d[tag].append(num)
... 
>>> from pprint import pprint
>>> pprint(dict(d))
{'tag_A': [100, 200, 300, 400], 'tag_B': [400, 600]}
>>> 
>>> pprint({k: sum(v) for k, v in d.iteritems()})
{'tag_A': 1000, 'tag_B': 1000}

Alternative summary routine

def summarize_by_tag(d):
    for k, v in d.iteritems():
        print k, sum(v)

>>> summarize_by_tag(d)
tag_A 1000
tag_B 1000

score 3 · Accepted Answer

As in other answers I would just use the defaultdict but unless you need the groups again later. Just sum them as you group. my_list could then be a very large iterable and you're not storing the whole thing in memory.

from collections import defaultdict
my_list = [('tag_A', 100), ('tag_A', 200), ('tag_A', 300), ('tag_A', 400), ('tag_B', 400), ('tag_B', 600)]
result = defaultdict(int)
for tag, value in my_list:
    result[tag] += value
print result

defaultdict(<type 'int'>, {'tag_A': 1000, 'tag_B': 1000})

score 1 · Accepted Answer

without importing anything. .

mysum={}
my_list = [('tag_A', 100), ('tag_A', 200), ('tag_A', 300), ('tag_A', 400), ('tag_B', 400), ('tag_B', 600)]
for x in my_list:
    mysum.setdefault(x[0],0)
    mysum[x[0]]+=x[1]
print mysum

output::

{'tag_A': 1000, 'tag_B': 1000}

python - Python list_of_tuples: sum second val of each tuple, only if first val of tuple == something

4 回答 4

Approach

Code

Test

Alternative summary routine

Related

Reference