使用列表理解和集合来跟踪所看到的内容:
myvariable = [['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'],
['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'],
['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']
]
seen=set()
print [li for li in myvariable
if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]
印刷:
[['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'],
['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]
由于列表理解是按顺序进行的,因此会保持顺序并删除后面的重复项:
>>> lis=[[1,2,1],
... [3,4,1],
... [1,2,2],
... [3,4,2]]
>>> seen=set()
>>> [li for li in lis if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]
[[1, 2, 1], [3, 4, 1]]
不容忽视的是,这是一种更快的方法:
from collections import OrderedDict
lis=[[1,2,1],
[3,4,1],
[1,2,2],
[3,4,2]]
def f1(lis):
seen=set()
return [li for li in lis
if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]
def f2(lis):
dic = OrderedDict()
for item in lis:
key = tuple(item[:2])
if key not in dic:
dic[key] = item
return dic.values()
if __name__ == '__main__':
import timeit
print 'f1, LC+set:',timeit.timeit("f1(lis)", setup="from __main__ import f1,lis"),'secs'
print 'f2, OrderedDic:',timeit.timeit("f2(lis)", setup="from __main__ import f2,lis,OrderedDict"),'secs'
印刷:
f1, LC+set: 2.81167197227 secs
f2, OrderedDic: 16.4299631119 secs
所以这种方法快了将近 6 倍