6

我有一个元组列表,如下所示:

[
    (1, "red")
    (1, "red,green")
    (1, "green,blue")
    (2, "green")
    (2, "yellow,blue")
]

我正在尝试汇总数据,以便获得以下 dict 输出:

{
    1: ["red", "green", "blue"]
    2: ["green", "yellow", "blue"]
}

注意:将颜色字符串组合为主键(数字),然后拆分为列表,并进行重复数据删除(例如使用set)。

我也想做相反的事情,并按颜色分组:

{
    "red": [1],
    "green": [1, 2]
    "yellow": [2]
    "blue": [1, 2]
}

我可以通过循环遍历所有元组来清楚地做到这一点,但如果可能的话,我想尝试使用 list / dict 理解来做到这一点。

4

3 回答 3

6

您可以使用 collections.defaultdict

>>> from collections import defaultdict
>>> lis = [                            
    (1, "red"),
    (1, "red,green"),
    (1, "green,blue"),
    (2, "green"),
    (2, "yellow,blue"),
]
>>> dic = defaultdict(set)       #sets only contain unique items
for k, v in lis:
    dic[k].update(v.split(','))

>>> dic
defaultdict(<type 'set'>,
{1: set(['blue', 'green', 'red']),
 2: set(['blue', 'green', 'yellow'])})

现在迭代dic

>>> dic2 = defaultdict(list)
for k,v in dic.iteritems():
    for val in v:
        dic2[val].append(k)
...         
>>> dic2
defaultdict(<type 'list'>,
{'blue': [1, 2],
 'green': [1, 2],
 'yellow': [2],
 'red': [1]})
于 2013-06-22T19:42:27.040 回答
0

另一种没有defaultdict的解决方案。

>>> input = [
...     (1, "red"),
...     (1, "red,green"),
...     (1, "green,blue"),
...     (2, "green"),
...     (2, "yellow,blue")
... ]
>>> result1 = {s[0]: set(s[1].split(',')) for s in input}
>>> for num, cols in input:
...     result1[num].update(cols.split(','))
... 
>>> print(result1)
{1: {'red', 'green', 'blue'}, 2: {'green', 'yellow', 'blue'}}
>>> 
>>> result2 = dict((k, []) for k in set.union(*result1.values()))
>>> for k,v in result1.items():
...     for val in v:
...         result2[val].append(k)
... 
>>> print(result2)
{'red': [1], 'green': [1, 2], 'yellow': [2], 'blue': [1, 2]}
>>> 

这不一定比使用 defaultdict 的解决方案更好。此外,这不是纯粹的理解,而是使用理解作为解决方案的一部分。

于 2021-10-02T18:16:00.317 回答
-2

使用 dict 理解生成唯一元素计数的一种解决方案:

X = [1,2,1,3,1,4,1,5,2,3,2,4,2,5,3,4,3,5,4,5,5]

Xagg = {xx: sum([int(y==xx) for y in X]) for xx in set(X)}

Xagg

于 2015-08-27T22:07:56.607 回答