0

我在根据列表中的多个匹配项制作字典时遇到了一些麻烦。

这是一个示例列表:

items = [["1.pdf", "123", "train", "plaza"],
         ["2.pdf","123", "plane", "town"],
         ["3.pdf", "456", "train", "plaza"],
         ["4.pdf", "123", "plane", "city"],
         ["5.pdf", "123", "train", "plaza"],
         ["6.pdf","123", "plane", "town"]]

我正在尝试做的是匹配每个列表中的最后三个项目并制作字典。

因此,根据上面的列表,我假设所需的输出是。

{1 : [["1.pdf", "123", "train", "plaza"],
      ["5.pdf", "123", "train", "plaza"]],
 2 : [["2.pdf","123", "plane", "town"],
      ["6.pdf","123", "plane", "town"]]
 3 : [["3.pdf", "456", "train", "plaza"]]
 4 : [["4.pdf", "123", "plane", "city"]]}
4

4 回答 4

7

我可以建议不同的输出数据格式吗?

from collections import *
d = defaultdict(list)

for item in items:
    d[tuple(item[1:])].append(item[0])

这会产生一个像这样的字典:

{
    ('123', 'train', 'plaza'): ['1.pdf', '5.pdf'],
    ('123', 'plane', 'town'):  ['2.pdf', '6.pdf'],
    ('123', 'plane', 'city'):  ['4.pdf'],
    ('456', 'train', 'plaza'): ['3.pdf']
}
于 2013-07-30T14:38:51.540 回答
1

忽略我糟糕的命名方案。

items = [["1.pdf", "123", "train", "plaza"],
         ["2.pdf","123", "plane", "town"],
         ["3.pdf", "456", "train", "plaza"],
         ["4.pdf", "123", "plane", "city"],
         ["5.pdf", "123", "train", "plaza"],
         ["6.pdf","123", "plane", "town"]]

final = dict()
for item in items:
    final[tuple(item[1:])] = final.get(tuple(item[1:]),[]) + [item]

new = dict()
for i in range(len(final)):
    new[i+1] = final.items()[i][1]

for key,items in new.items():
    print key, ":\n",items

产量(随机顺序):

{1 : [["1.pdf", "123", "train", "plaza"],
      ["5.pdf", "123", "train", "plaza"]],
 2 : [["2.pdf","123", "plane", "town"],
      ["6.pdf","123", "plane", "town"]]
 3 : [["3.pdf", "456", "train", "plaza"]]
 4 : [["4.pdf", "123", "plane", "city"]]}
于 2013-07-30T14:34:27.380 回答
1

您可以使用collections.defaultdict

>>> from collections import defaultdict
>>> dic = defaultdict(list)
for item in items:
    dic[tuple(item[1:])].append(item)
...     
>>> ans = { i: item for i, item in enumerate(dic.values(), 1)}
>>> pprint(ans)
{1: [['1.pdf', '123', 'train', 'plaza'], ['5.pdf', '123', 'train', 'plaza']],
 2: [['2.pdf', '123', 'plane', 'town'], ['6.pdf', '123', 'plane', 'town']],
 3: [['4.pdf', '123', 'plane', 'city']],
 4: [['3.pdf', '456', 'train', 'plaza']]}

如果订单很重要,请使用collections.OrderedDict

>>> from collections import OrderedDict
>>> dic = OrderedDict()
for item in items:                                        
    dic.setdefault(tuple(item[1:]), []).append(item)
...     
>>> ans = { i: item for i, item in enumerate(dic.values(), 1)}
>>> pprint(ans)
{1: [['1.pdf', '123', 'train', 'plaza'], ['5.pdf', '123', 'train', 'plaza']],
 2: [['2.pdf', '123', 'plane', 'town'], ['6.pdf', '123', 'plane', 'town']],
 3: [['3.pdf', '456', 'train', 'plaza']],
 4: [['4.pdf', '123', 'plane', 'city']]}
于 2013-07-30T14:35:40.897 回答
1

您正在寻找的是groupby操作。如果您使用的是熊猫

In [2]: items
Out[2]: 
[['1.pdf', '123', 'train', 'plaza'],
 ['2.pdf', '123', 'plane', 'town'],
 ['3.pdf', '456', 'train', 'plaza'],
 ['4.pdf', '123', 'plane', 'city'],
 ['5.pdf', '123', 'train', 'plaza'],
 ['6.pdf', '123', 'plane', 'town']]

In [3]: df = pd.DataFrame.from_records(items)

In [4]: df
Out[4]: 
       0    1      2      3
0  1.pdf  123  train  plaza
1  2.pdf  123  plane   town
2  3.pdf  456  train  plaza
3  4.pdf  123  plane   city
4  5.pdf  123  train  plaza
5  6.pdf  123  plane   town


In [5]: for n, g in df.groupby([1, 2, 3]):
    print "name", n
    print g
   ....:     
name ('123', 'plane', 'city')
       0    1      2     3
3  4.pdf  123  plane  city
name ('123', 'plane', 'town')
       0    1      2     3
1  2.pdf  123  plane  town
5  6.pdf  123  plane  town
name ('123', 'train', 'plaza')
       0    1      2      3
0  1.pdf  123  train  plaza
4  5.pdf  123  train  plaza
name ('456', 'train', 'plaza')
       0    1      2      3
2  3.pdf  456  train  plaza
于 2013-07-30T14:38:31.037 回答