0

Using the following list of lists (4 individual lists inside one big list)

myvariable = [['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']
             ]

I need to cycle through each list and check if element 0 and element 1 are the same in any of the other elements, if they are BOTH a match then it should remove the latter list (so in my example it removes the middle list.

Each time it removes an item from the list it needs to update the list.

Anyone have any ideas?

4

3 回答 3

4

使用以前两项为键的 dict:

>>> lis = [['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]
>>> from collections import OrderedDict
>>> dic = OrderedDict()
>>> for item in lis:
...     key = tuple(item[:2])
...     if key not in dic:
...         dic[key] = item
...         
>>> dic.values()
[
 ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'],
 ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']
]
于 2013-10-19T15:33:16.087 回答
2

使用列表理解和集合来跟踪所看到的内容:

myvariable = [['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']
             ]

seen=set()
print [li for li in myvariable 
         if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]

印刷:

[['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
 ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]

由于列表理解是按顺序进行的,因此会保持顺序并删除后面的重复项:

>>> lis=[[1,2,1],
...      [3,4,1],
...      [1,2,2],
...      [3,4,2]]
>>> seen=set()
>>> [li for li in lis if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]
[[1, 2, 1], [3, 4, 1]]

不容忽视的是,这是一种更快的方法:

from collections import OrderedDict  

lis=[[1,2,1],
     [3,4,1],
     [1,2,2],
     [3,4,2]]

def f1(lis):
    seen=set()
    return [li for li in lis 
             if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]       

def f2(lis):
    dic = OrderedDict()
    for item in lis:
        key = tuple(item[:2])
        if key not in dic:
            dic[key] = item

    return dic.values()

if __name__ == '__main__':
    import timeit            
    print 'f1, LC+set:',timeit.timeit("f1(lis)", setup="from __main__ import f1,lis"),'secs'
    print 'f2, OrderedDic:',timeit.timeit("f2(lis)", setup="from __main__ import f2,lis,OrderedDict"),'secs'

印刷:

f1, LC+set: 2.81167197227 secs
f2, OrderedDic: 16.4299631119 secs

所以这种方法快了将近 6 倍

于 2013-10-19T18:53:48.580 回答
1

此列表理解保留了顺序并消除了第一个之后的所有重复项。

>>> check = [L[0:2] for L in myvariable]
>>> [el for i, el in enumerate(myvariable) if el[0:2] not in check[:i]]
[['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]

这是一个列表理解和标准 dict 解决方案,对于更大的列表会表现更好。

>>> d={}
>>> [d.setdefault(tuple(el[:2]), el) for el in myvar if tuple(el[:2]) not in d]
[['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]
于 2013-10-20T01:57:23.927 回答