11

我有一个包含嵌套列表的列表,我需要知道在这些嵌套列表中搜索的最有效方法。

例如,如果我有

[['a','b','c'],
['d','e','f']]

我必须搜索上面的整个列表,找到“d”的最有效方法是什么?

4

5 回答 5

12
>>> lis=[['a','b','c'],['d','e','f']]
>>> any('d' in x for x in lis)
True

generator expression using any

$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "any('d' in x for x in lis)" 
1000000 loops, best of 3: 1.32 usec per loop

generator expression

$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
100000 loops, best of 3: 1.56 usec per loop

list comprehension

$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
100000 loops, best of 3: 3.23 usec per loop

How about if the item is near the end, or not present at all? any is faster than the list comprehension

$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
    "'NOT THERE' in [y for x in lis for y in x]"
100000 loops, best of 3: 4.4 usec per loop

$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" 
    "any('NOT THERE' in x for x in lis)"
100000 loops, best of 3: 3.06 usec per loop

Perhaps if the list is 1000 times longer? any is still faster

$ python -m timeit -s "lis=1000*[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
    "'NOT THERE' in [y for x in lis for y in x]"
100 loops, best of 3: 3.74 msec per loop
$ python -m timeit -s "lis=1000*[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" 
    "any('NOT THERE' in x for x in lis)"
100 loops, best of 3: 2.48 msec per loop

We know that generators take a while to set up, so the best chance for the LC to win is a very short list

$ python -m timeit -s "lis=[['a','b','c']]"
    "any('c' in x for x in lis)"
1000000 loops, best of 3: 1.12 usec per loop
$ python -m timeit -s "lis=[['a','b','c']]"
    "'c' in [y for x in lis for y in x]"
1000000 loops, best of 3: 0.611 usec per loop

And any uses less memory too

于 2012-08-15T04:43:18.147 回答
6

使用列表理解,给出:

mylist = [['a','b','c'],['d','e','f']]
'd' in [j for i in mylist for j in i]

产量:

True

这也可以通过生成器来完成(如@AshwiniChaudhary 所示)

根据以下评论更新:

这是相同的列表推导,但使用了更具描述性的变量名称:

'd' in [elem for sublist in mylist for elem in sublist]

列表理解部分中的循环结构等价于

for sublist in mylist:
   for elem in sublist

并生成一个列表,其中 'd' 可以使用in操作员进行测试。

于 2012-08-15T03:24:08.990 回答
4

使用生成器表达式,这里不会遍历整个列表,因为生成器会一一生成结果:

>>> lis = [['a','b','c'],['d','e','f']]
>>> 'd' in (y for x in lis for y in x)
True
>>> gen = (y for x in lis for y in x)
>>> 'd' in gen
True
>>> list(gen)
['e', 'f']

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
    100000 loops, best of 3: 2.96 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
    100000 loops, best of 3: 7.4 usec per loop
于 2012-08-15T03:23:31.173 回答
2

如果您的数组总是按您显示的那样排序,那么a[i][j] <= a[i][j+1]并且a[i][-1] <= a[i+1][0](一个数组的最后一个元素总是小于或等于下一个数组中的第一个元素),那么您可以通过执行以下操作来消除大量比较:

a = # your big array

previous = None
for subarray in a:
   # In this case, since the subarrays are sorted, we know it's not in
   # the current subarray, and must be in the previous one
   if a[0] > theValue:
      break
   # Otherwise, we keep track of the last array we looked at
   else:
      previous = subarray

return (theValue in previous) if previous else False

只有当你有很多数组并且它们都有很多元素时,这种优化才值得。

于 2012-08-15T03:37:53.173 回答
0

如果您只想知道您的元素是否在列表中,那么您可以通过将列表转换为字符串并检查它来做到这一点。你可以扩展这个更多的嵌套列表。像 [[1],'a','b','d',['a','b',['c',1]]] 如果您不知道嵌套列表的级别并且想知道可搜索项目是否存在。

    search='d'
    lis = [['a',['b'],'c'],[['d'],'e','f']]
    print(search in str(lis)) 
于 2018-07-27T20:30:21.553 回答