python - Python Dedup/Merge 字典列表

Question

假设我有一个字典列表：

list = [{'name':'john','age':'28','location':'hawaii','gender':'male'},
        {'name':'john','age':'32','location':'colorado','gender':'male'},
        {'name':'john','age':'32','location':'colorado','gender':'male'},
        {'name':'parker','age':'24','location':'new york','gender':'male'}]

在这个字典中，'name' 可以被认为是一个唯一的标识符。我的目标不仅是为相同的字典（即列表 [1] 和列表 [2]）删除此列表，而且还为单个“名称”（即列表 [0] 和列表 [1/2 ]. 换句话说，我想将示例中的所有 'name'='john' 字典合并到一个字典中，如下所示：

dedup_list = [{'name':'john','age':'28; 32','location':'hawaii; colorado','gender':'male'},
              {'name':'parker','age':'24','location':'new york','gender':'male'} ]

到目前为止，我已经尝试创建我的第二个列表 dedup_list，并遍历第一个列表。如果 dedup_list 的字典之一中不存在“name”键，我将附加它。这是我卡住的合并部分。

for dict in list:
    for new_dict in dedup_list:
        if dict['name'] in new_dict:
            # MERGE OTHER DICT FIELDS HERE
        else:
            dedup_list.append(dict) # This will create duplicate values as it iterates through each row of the dedup_list.  I can throw them in a set later to remove?

我的 dicts 列表永远不会包含超过 100 个项目，因此 O(n^2) 解决方案绝对是可以接受的，但不一定是理想的。这个 dedup_list 最终将被写入 CSV，所以如果有解决方案，我会全力以赴。

谢谢！

score 2 · Accepted Answer

好吧，我正要制定一个解决方案defaultdict，但希望@hivert 发布了我能提供的最佳解决方案，这就是这个答案：

from collections import defaultdict

dicts = [{'a':1, 'b':2, 'c':3},
         {'a':1, 'd':2, 'c':'foo'},
         {'e':57, 'c':3} ]

super_dict = defaultdict(set)  # uses set to avoid duplicates

for d in dicts:
    for k, v in d.iteritems():
        super_dict[k].add(v)

即我投票赞成关闭这个问题作为那个问题的欺骗。

注意：您不会获得诸如之类的值'28; 32'，而是获得包含的集合[28,32]，然后可以根据需要将其处理为 csv 文件。

NB2：要编写 csv 文件，请查看DictWriter类

python - Python Dedup/Merge 字典列表

1 回答 1

Related

Reference