python - python 2.6-有效地删除和计算字典列表中的重复项

Question

我正在尝试有效地改变：

[{'text': 'hallo world', 'num': 1}, 
 {'text': 'hallo world', 'num': 2}, 
 {'text': 'hallo world', 'num': 1}, 
 {'text': 'haltlo world', 'num': 1}, 
 {'text': 'hallo world', 'num': 1}, 
 {'text': 'hallo world', 'num': 1}, 
 {'text': 'hallo world', 'num': 1}]

进入没有重复的字典列表和重复的计数：

[{'text': 'hallo world', 'num': 2, 'count':1}, 
 {'text': 'hallo world', 'num': 1, 'count':5}, 
 {'text': 'haltlo world', 'num': 1, 'count':1}]

到目前为止，我有以下内容可以找到重复项：

result = [dict(tupleized) for tupleized in set(tuple(item.items()) for item in li)]

它返回：

[{'text': 'hallo world', 'num': 2}, 
 {'text': 'hallo world', 'num': 1}, 
 {'text': 'haltlo world', 'num': 1}]

谢谢！

score 6 · Accepted Answer

注意：现在使用frozenset这意味着字典中的项目必须是可散列的。

>>> from collections import defaultdict
>>> from itertools import chain
>>> data = [{'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 2},  {'text': 'hallo world', 'num': 1}, {'text': 'haltlo world', 'num': 1}, {'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 1}, {'text': 'hallo world', 'num': 1}]
>>> c = defaultdict(int)
>>> for d in data:
        c[frozenset(d.iteritems())] += 1


>>> [dict(chain(k, (('count', count),))) for k, count in c.iteritems()]
[{'count': 1, 'text': 'haltlo world', 'num': 1}, {'count': 1, 'text': 'hallo world', 'num': 2}, {'count': 5, 'text': 'hallo world', 'num': 1}]

score 6 · Accepted Answer

我将使用我的最爱之一itertools：

from itertools import groupby

def canonicalize_dict(x):
    "Return a (key, value) list sorted by the hash of the key"
    return sorted(x.items(), key=lambda x: hash(x[0]))

def unique_and_count(lst):
    "Return a list of unique dicts with a 'count' key added"
    grouper = groupby(sorted(map(canonicalize_dict, lst)))
    return [dict(k + [("count", len(list(g)))]) for k, g in grouper]

a = [{'text': 'hallo world', 'num': 1},  
     #....
     {'text': 'hallo world', 'num': 1}]

print unique_and_count(a)

输出

[{'count': 5, 'text': 'hallo world', 'num': 1}, 
{'count': 1, 'text': 'hallo world', 'num': 2}, 
{'count': 1, 'text': 'haltlo world', 'num': 1}]

正如 gnibbler 指出的那样，即使键相同，也可能具有不同的d1.items()键d2.items()顺序，因此我引入了该功能来解决此问题。canonical_dict

score 1 · Accepted Answer

想要简单的解决方案而不使用任何内置函数，

>>> d = [{'text': 'hallo world', 'num': 1}, 
...  {'text': 'hallo world', 'num': 2}, 
...  {'text': 'hallo world', 'num': 1}, 
...  {'text': 'haltlo world', 'num': 1}, 
...  {'text': 'hallo world', 'num': 1}, 
...  {'text': 'hallo world', 'num': 1}, 
...  {'text': 'hallo world', 'num': 1}]
>>> 
>>> def unique_counter(filesets):
...      for i in filesets:
...          i['count'] = sum([1 for j in filesets if j['num'] == i['num']])
...      return {k['num']:k for k in filesets}.values()
... 
>>> unique_counter(d)
[{'count': 6, 'text': 'hallo world', 'num': 1}, {'count': 1, 'text': 'hallo world', 'num': 2}]

python - python 2.6-有效地删除和计算字典列表中的重复项

3 回答 3

Related

Reference