2

我有一个字符串列表,这是一个例子:

wallList = ['wall_l0', 'wall_l1', 'wall_broken_l0', 'wall_broken_l1',
             'wall_vwh_l0','wall_vwh_l1', 'wall_vwh_broken_l0', 
             'wall_vwh_broken_l1', 'wall_vpi_l0', 'wall_vpi_l1', 
             'wall_vpi_broken_l0', 'wall_vpi_broken_l1']

我想按墙类型和状态(默认/损坏)将它们组合在一起:

[['wall_l0', 'wall_l1'],['wall_broken_l0', 'wall_broken_l1']]

[['wall_vwh_l0', 'wall_vwh_l1'],['wall_vwh_broken_l0', 'wall_vwh_broken_l1']] 

[['wall_vpi_l0', 'wall_vpi_l1'],['wall_vpi_broken_l0', 'wall_vpi_broken_l1']]

任何人都知道如何最好地做到这一点,或者知道 python 食谱?

4

4 回答 4

4

这里有一个小单线来做到这一点:

import itertools, re
results = [list(v) for (k, v) in itertools.groupby(sorted(wallList),
        lambda x: re.sub(r'\d+', '0', x))]

这不会保留顺序,但否则它会为您提供所需的相同输出。

它的工作原理是查看所有数字都转换为“0”的版本并对重复项进行分组。

于 2012-09-15T01:53:25.273 回答
3

编辑:显然我的回答只是部分正确,因为我忘了按“状态”进行组织。正确答案来自@samy.vilar。

使用itertools.groupby

>>> from itertools import groupby
>>> [list(g) for k,g in groupby(sorted(wallList), lambda r: r[:-1])]
[['wall_broken_l0', 'wall_broken_l1'], ['wall_l0', 'wall_l1'], ['wall_vpi_broken_l0
', 'wall_vpi_broken_l1'], ['wall_vpi_l0', 'wall_vpi_l1'], ['wall_vwh_broken_l0', 'w
all_vwh_broken_l1'], ['wall_vwh_l0', 'wall_vwh_l1']]
于 2012-09-15T01:56:03.430 回答
2

Interesting, first we need to break it up by wall type so we can do this.

>>> from itertools import groupby
>>> wallList = ['wall_l0', 'wall_l1', 'wall_broken_l0', 'wall_broken_l1',
         'wall_vwh_l0','wall_vwh_l1', 'wall_vwh_broken_l0', 
         'wall_vwh_broken_l1', 'wall_vpi_l0', 'wall_vpi_l1', 
         'wall_vpi_broken_l0', 'wall_vpi_broken_l1']
>>> list(groupby(sorted(wallList), lambda wall: wall.replace('_broken', '')[:-3]))
[('wall', <itertools._grouper object at 0x1004edc50>), ('wall_vpi', <itertools._grouper object at 0x1004edb90>), ('wall_vwh', <itertools._grouper object at 0x1004eda90>)]

great now that we have the types lets separate by those that are broken.

this is what everything looks like together.

>>> from itertools import groupby
>>> wallList = ['wall_l0', 'wall_l1', 'wall_broken_l0', 'wall_broken_l1',
         'wall_vwh_l0','wall_vwh_l1', 'wall_vwh_broken_l0', 
         'wall_vwh_broken_l1', 'wall_vpi_l0', 'wall_vpi_l1', 
         'wall_vpi_broken_l0', 'wall_vpi_broken_l1']

>>> values = [[list(v) for k, v in groupby(values, lambda value: '_broken_' in value)] 
...             for key, values in groupby(sorted(wallList), lambda wall: wall.replace('_broken', '')[:-3])]
>>> from pprint import pprint
>>> pprint(values)
[[['wall_broken_l0', 'wall_broken_l1'], ['wall_l0', 'wall_l1']],
 [['wall_vpi_broken_l0', 'wall_vpi_broken_l1'],
  ['wall_vpi_l0', 'wall_vpi_l1']],
 [['wall_vwh_broken_l0', 'wall_vwh_broken_l1'],
  ['wall_vwh_l0', 'wall_vwh_l1']]]

there are surely other ways, but this seems to be concise.

Here is another way:

>>> from collections import defaultdict
>>> values = defaultdict(lambda : defaultdict(list))
>>> for wall in wallList:
...     if 'broken' in wall:
...         values[wall[:-3].replace('_broken', '')]['broken'].append(wall)
...     else:
...         values[wall[:-3]]['default'].append(wall)
... 
>>> values.items()
[('wall', defaultdict(<type 'list'>, {'default': ['wall_l0', 'wall_l1'], 'broken': ['wall_broken_l0', 'wall_broken_l1']})), ('wall_vpi', defaultdict(<type 'list'>, {'default': ['wall_vpi_l0', 'wall_vpi_l1'], 'broken': ['wall_vpi_broken_l0', 'wall_vpi_broken_l1']})), ('wall_vwh', defaultdict(<type 'list'>, {'default': ['wall_vwh_l0', 'wall_vwh_l1'], 'broken': ['wall_vwh_broken_l0', 'wall_vwh_broken_l1']}))]
>>>

This second method should be faster since we are only iterating once, dictionary look ups are constant, and we can access any set of walls by name as well as state ...

>>> values['wall']['default']
['wall_l0', 'wall_l1']
>>> values['wall_vpi']['default']
['wall_vpi_l0', 'wall_vpi_l1']
>>> values['wall_vpi']['broken']
['wall_vpi_broken_l0', 'wall_vpi_broken_l1']
>>>
于 2012-09-15T02:35:29.930 回答
-2

拆分 _: string.split('_')。如果你得到 2 个字段,你就有了退化的情况。如果你得到 3,则按 3 的中间字段分组。列表字典可能会有所帮助,或者更好的是 collections.defaultdict(list)。

于 2012-09-15T01:49:34.300 回答