python - Python列表理解和多次引用一个对象

Question

这是一个人为的示例，用于演示在 for 循环和列表理解中多次引用同一个字典项。首先，for循环：

dict_index_mylists = {0:['a', 'b', 'c'], 1:['b', 'c', 'a'], 2:['c', 'a', 'b']}

# for-loop
myseq = []
for i in [0, 1, 2]:
    interim = dict_index_mylists[i]
    if interim[0] == 'b' or interim[1] == 'c' or interim[2] == 'a':    
        myseq.append(interim)

在 for 循环中，临时列表从字典对象中引用，然后在 if 条件中被多次引用，如果字典非常大和/或在存储中，这可能特别有意义。再说一次，“临时”引用可能是不必要的，因为 Python 字典已针对性能进行了优化。

这是 for 循环的列表理解：

# list-comprehension
myseq = [dict_index_mylists[i] for i in [0, 1, 2] if dict_index_mylists[i][0] == 'b' or dict_index_mylists[i][1] == 'c' or dict_index_mylists[i][2] == 'a']

问题是：

一种。列表理解是否多次引用字典项目，还是引用并保留一个本地“临时”列表以进行处理？

湾。在同一个字典项上包含多个条件并且字典非常大的最佳列表理解表达式是什么？

score 1 · Accepted Answer

You seem to be asking only about optimization of common sub-expressions. In your list comprehension, it will index into the dictionary multiple times. Python is dynamic, it is difficult to know what side effects an operation like dict_index_mylists[i] might have, so CPython simply executes the operation as many times as you tell it to.

Other implementations like PyPy use a JIT and may optimize away subexpressions, but it is difficult to know for sure what it will do ahead of time.

If you are very concerned with performance, you need to time various options to see which is best.

score 0 · Accepted Answer

我不是查看 python 字节码的专家，但这是我今天早上尝试学习新东西的尝试：

def dostuff():
    myseq = [dict_index_mylists[i] for i in [0, 1, 2] if dict_index_mylists[i][0] == 'b' or dict_index_mylists[i][1] == 'c' or dict_index_mylists[i][2] == 'a']

import dis
dis.dis(dostuff)

如果您查看输出（如下），则有 4 次调用LOAD_GLOBAL，因此看起来 python 没有存储临时列表。至于你的第二个问题，你所拥有的可能是你能做到的。不过，它并没有你想象的那么糟糕。 dict对象通过散列函数访问项目，因此它们的查找复杂度O(1)与字典大小无关。当然，您总是可以使用timeit和比较这两种实现（使用循环和列表比较），然后选择更快的一种。剖析（一如既往）是您的朋友。

APENDIX (output of dis.dis(dostuff))

5           0 BUILD_LIST               0
            3 DUP_TOP             
            4 STORE_FAST               0 (_[1])
            7 LOAD_CONST               1 (0)
           10 LOAD_CONST               2 (1)
           13 LOAD_CONST               3 (2)
           16 BUILD_LIST               3
           19 GET_ITER            
      >>   20 FOR_ITER                84 (to 107)
           23 STORE_FAST               1 (i)
           26 LOAD_GLOBAL              0 (dict_index_mylists)
           29 LOAD_FAST                1 (i)
           32 BINARY_SUBSCR       
           33 LOAD_CONST               1 (0)
           36 BINARY_SUBSCR       
           37 LOAD_CONST               4 ('b')
           40 COMPARE_OP               2 (==)
           43 JUMP_IF_TRUE            42 (to 88)
           46 POP_TOP             
           47 LOAD_GLOBAL              0 (dict_index_mylists)
           50 LOAD_FAST                1 (i)
           53 BINARY_SUBSCR       
           54 LOAD_CONST               2 (1)
           57 BINARY_SUBSCR       
           58 LOAD_CONST               5 ('c')
           61 COMPARE_OP               2 (==)
           64 JUMP_IF_TRUE            21 (to 88)
           67 POP_TOP             
           68 LOAD_GLOBAL              0 (dict_index_mylists)
           71 LOAD_FAST                1 (i)
           74 BINARY_SUBSCR       
           75 LOAD_CONST               3 (2)
           78 BINARY_SUBSCR       
           79 LOAD_CONST               6 ('a')
           82 COMPARE_OP               2 (==)
           85 JUMP_IF_FALSE           15 (to 103)
      >>   88 POP_TOP             
           89 LOAD_FAST                0 (_[1])
           92 LOAD_GLOBAL              0 (dict_index_mylists)
           95 LOAD_FAST                1 (i)
           98 BINARY_SUBSCR       
           99 LIST_APPEND         
          100 JUMP_ABSOLUTE           20
      >>  103 POP_TOP             
          104 JUMP_ABSOLUTE           20
      >>  107 DELETE_FAST              0 (_[1])
          110 STORE_FAST               2 (myseq)
          113 LOAD_CONST               0 (None)
          116 RETURN_VALUE

score 0 · Accepted Answer

First point: nothing (expect 'myseq') is "created" here, neither in the forloop nor in the listcomp versions of your code - it's just a reference to the existing dict item.

Now to answer you questions : the list comp version will make a lookup (a call to dict.__getitem__ for each of the dict_index_mylists[i] expressions. Each of these lookup will a return a reference to the same list. You can avoid these extra lookups by retaining a local reference to the dict's items, ie :

myseq = [
    item for item in (dict_index_mylists[i] for i in (0, 1, 2)) 
    if item[0] == 'b' or item[1] == 'c' or item[2] == 'a'
    ]

but there's no point in writing a listcomp just for the sake of writing a listcomp.

Note that if you don't care about the original ordering and want to apply this to your whole dict, using dict.itervalues() would be simpler.

wrt/ the second question, "optimal" is not an absolute. What do you want to optimize for ? space ? time ? readability ?

python - Python列表理解和多次引用一个对象

3 回答 3

Related

Reference