1

所以我有一个像这样的元组列表:

[
    ('Worksheet',),
    ('1a', 'Calculated'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1b', 'General'),
    ('1b', 'General', 'Basic'),
    ('1b', 'General', 'Basic', 'Data'),
    ('1b', 'General', 'Basic', 'Data', 'Line 1'),
    ('1b', 'General', 'Basic', 'Data', 'Line 2'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1c', 'General'),
    ('1c', 'General', 'Basic'),
    ('1c', 'General', 'Basic', 'Data'),
    ('None', 'None', 'None', 'None', 'None'),
    ('2', 'Active'),
    ('2', 'Active', 'Passive'),
    ('None', 'None', 'None', 'None', 'None'),
    ...
]

每个元组的长度为 1-5。我需要递归地减少列表以结束:

[
    ('Worksheet',),
    ('1a', 'Calculated'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1b', 'General', 'Basic', 'Data', 'Line 1'),
    ('1b', 'General', 'Basic', 'Data', 'Line 2'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1c', 'General', 'Basic', 'Data'),
    ('None', 'None', 'None', 'None', 'None'),
    ('2', 'Active', 'Passive'),
    ('None', 'None', 'None', 'None', 'None'),
    ...
]

基本上,如果下一行与上一行 +1 中的所有内容匹配,则将其删除到具有相同层次结构的元组的最大长度。

因此,如我的示例中所示1c,元组中的第一项是 3 行,因此它被减少到最长。

4

3 回答 3

1

在第一个元素上对元组进行分组;使用itertools.groupby()operator.itemgetter()为了便于创建密钥。

然后分别过滤每个组:

from itertools import groupby, chain
from operator import itemgetter

def filtered_group(group):
    group = list(group)
    maxlen = max(len(l) for l in group)
    return [l for l in group if len(l) == maxlen]

filtered = [filtered_group(g) for k, g in groupby(inputlist, key=itemgetter(0))]
output = list(chain.from_iterable(filtered))

演示:

>>> from itertools import groupby, chain
>>> from operator import itemgetter
>>> from pprint import pprint
>>> def filtered_group(group):
...     group = list(group)
...     maxlen = max(len(l) for l in group)
...     return [l for l in group if len(l) == maxlen]
... 
>>> filtered = [filtered_group(g) for k, g in groupby(inputlist, key=itemgetter(0))]
>>> pprint(list(chain.from_iterable(filtered)))
[('Worksheet',),
 ('1a', 'Calculated'),
 ('None', 'None', 'None', 'None', 'None'),
 ('1b', 'General', 'Basic', 'Data', 'Line 1'),
 ('1b', 'General', 'Basic', 'Data', 'Line 2'),
 ('None', 'None', 'None', 'None', 'None'),
 ('1c', 'General', 'Basic', 'Data'),
 ('None', 'None', 'None', 'None', 'None'),
 ('2', 'Active', 'Passive'),
 ('None', 'None', 'None', 'None', 'None')]
于 2013-10-23T13:58:01.360 回答
1
from pprint import pprint

l=[
    ('Worksheet',),
    ('1a', 'Calculated'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1b', 'General'),
    ('1b', 'General', 'Basic'),
    ('1b', 'General', 'Basic', 'Data'),
    ('1b', 'General', 'Basic', 'Data', 'Line 1'),
    ('1b', 'General', 'Basic', 'Data', 'Line 2'),
    ('None', 'None', 'None', 'None', 'None'),
    ('1c', 'General'),
    ('1c', 'General', 'Basic'),
    ('1c', 'General', 'Basic', 'Data'),
    ('None', 'None', 'None', 'None', 'None'),
    ('2', 'Active'),
    ('2', 'Active', 'Passive'),
    ('None', 'None', 'None', 'None', 'None')
    #...
]

i=0
while i<len(l)-1:
  l0=l[i]
  l1=l[i+1]
  if len(l1)==len(l0)+1 and l1[:-1]==l0:
    del l[i]
  else:
    i+=1

pprint(l)

逻辑:将每一行(除了最后一行)与下一行进行比较。如果下一个与一个附加项目相同,则删除第一个。否则,前进到下一行。

这不是递归解决方案,但可以重新设计为一个。这是一个过滤操作,您需要条件中的下一项。

只是为了好玩,这里有一个递归的 Haskell 版本(这种类型的递归在 Haskell 和 Scheme 中是有效的,但在 Python 中不是):

prefixfilt :: Eq a => [[a]] -> [[a]]
prefixfilt [] = []
prefixfilt [x] = [x]
prefixfilt (x0:x1:xs) =
    if x0 == init x1 then rest else (x0:rest)
    where rest = prefixfilt (x1:xs)
于 2013-10-23T14:12:28.693 回答
1
def is_subtuple(tup1, tup2):
    '''Return True if all the elements of tup1 are consecutively in tup2.'''
    if len(tup2) < len(tup1): return False
    try:
        offset = tup2.index(tup1[0])
    except ValueError:
        return False
    # This could be wrong if tup1[0] is in tup2, but doesn't start the subtuple.
    # You could solve this by recurring on the rest of tup2 if this is false, but
    # it doesn't apply to your input data.
    return tup1 == tup2[offset:offset+len(tup1)] 

然后,只需过滤您的输入列表(在此处命名l):

[t for i, t in enumerate(l) if not any(is_subtuple(t, t2) for t2 in l[i+1:])]

现在,这个列表理解假设输入列表的顺序与您显示它的方式一致,子元组早于它们所在的元组。它也有点贵(O(n**2)我认为),但它会得到这份工作完毕。

于 2013-10-23T14:27:47.333 回答