python - 迭代列表时更改列表的最佳方法

Question

我在 python 脚本（v2.6）中有几个实例，我需要就地修改列表。我需要从列表中弹出值以响应用户的交互式输入，并且想知道最干净的方法。目前，我有以下非常肮脏的解决方案：a）将列表中要删除的项目设置为 False 并使用过滤器或列表理解将它们删除，或者 b）在循环时生成一个全新的列表，这似乎是不必要的将变量添加到命名空间并占用内存。

这个问题的一个例子如下：

for i, folder in enumerate(to_run_folders):
    if get_size(folder) < byte_threshold:
        ans = raw_input(('The folder {0}/ is less than {1}MB.' + \
                    ' Would you like to exclude it from' + \
                    ' compression? ').format(folder, megabyte_threshold))
        if 'y' in ans.strip().lower():
            to_run_folders.pop(i)

我想查看列表中的每个文件夹。如果当前文件夹小于某个大小，我想询问用户是否要排除它。如果有，请从列表中弹出该文件夹。

这个例程的问题是，如果我遍历列表，我会得到意外的行为和提前终止。如果我通过切片对副本进行迭代，pop 不会提取正确的值，因为索引被移动，并且随着更多项目的弹出，问题变得更加复杂。我也需要在脚本的其他区域进行这种动态列表调整。这种功能有什么干净的方法吗？

score 9 · Accepted Answer

您可以向后循环列表，或使用视图对象。

有关如何向后循环列表，请参阅https://stackoverflow.com/a/181062/711085 。基本上使用reversed(yourList)（这会创建一个向后访问的视图对象）。

如果你需要索引，你可以这样做reversed(enumerate(yourList))，但这会有效地在内存中创建一个临时列表，因为enumerate需要在启动之前运行reversed。你需要进行索引操作，或者这样做：

for i in xrange(len(yourList)-1, -1, -1):
    item = yourList[i]
    ...

甚至更清洁：reversed知道range，因此您可以在 python3 中执行此操作，或者如果您改用 python2 则可以在 python2 中执行此操作xrange：

for i in reversed(range(len(yourList))):  
    item = yourList[i]
    ...

（证明：你可以这样做next(reversed(range(10**10)))，但如果使用 python2，这会使你的计算机崩溃）

score 4 · Accepted Answer

你可以向后循环

向后：

x = range(10)
l = len(x)-1  # max index

for i, v in enumerate(reversed(x)):
    if v % 2:
        x.pop(l-i)  # l-1 is the forward index

score 1 · Accepted Answer

目前，我有以下非常肮脏的解决方案：a）将列表中要删除的项目设置为 False 并使用过滤器或列表理解将它们删除，或者 b）在循环时生成一个全新的列表，这似乎是不必要的将变量添加到命名空间并占用内存。

实际上，这不是那么肮脏的解决方案。列表通常有多长？即使创建新列表也不应该消耗太多内存，因为列表只包含引用。

您还可以在循环中while循环并自己枚举，del lst[n]如果用户决定则执行（可能单独计算原始位置中的位置）。

score 1 · Accepted Answer

好的，我已经测量了解决方案。颠倒的解决方案大致相同。前向while循环慢了大约 4 倍。但！对于 100,000 个随机整数列表[Patrik2 中的错误已更正] ，Patrik 的脏解决方案大约快 80 倍：

import timeit
import random

def solution_ninjagecko1(lst):
    for i in xrange(len(lst)-1, -1, -1):
        if lst[i] % 2 != 0:    # simulation of the choice
            del lst[i]
    return lst

def solution_jdi(lst):
    L = len(lst) - 1
    for i, v in enumerate(reversed(lst)):
        if v % 2 != 0:
            lst.pop(L-i)  # L-1 is the forward index
    return lst

def solution_Patrik(lst):
    for i, v in enumerate(lst):
        if v % 2 != 0:         # simulation of the choice
            lst[i] = None
    return [v for v in lst if v is not None]

def solution_Patrik2(lst):
    ##buggy lst = [v for v in lst if v % 2 != 0]
    ##buggy return [v for v in lst if v is not None]
    # ... corrected to
    return [v for v in lst if v % 2 != 0]

def solution_pepr(lst):
    i = 0                      # indexing the processed item
    n = 0                      # enumerating the original position
    while i < len(lst):
        if lst[i] % 2 != 0:    # simulation of the choice
            del lst[i]         # i unchanged if item deleted
        else:
            i += 1             # i moved to the next
        n += 1
    return lst

def solution_pepr_reversed(lst):
    i = len(lst) - 1           # indexing the processed item backwards
    while i > 0:
        if lst[i] % 2 != 0:    # simulation of the choice
            del lst[i]         # i unchanged if item deleted
        i -= 1                 # i moved to the previous
    return lst

def solution_steveha(lst):
    def should_keep(x):
        return x % 2 == 0
    return filter(should_keep, lst)

orig_lst = range(30)
print 'range() generated list of the length', len(orig_lst)
print orig_lst[:20] + ['...']   # to have some fun :)

lst = orig_lst[:]  # copy of the list
print solution_ninjagecko1(lst)

lst = orig_lst[:]  # copy of the list
print solution_jdi(lst)

lst = orig_lst[:]  # copy of the list
print solution_Patrik(lst)

lst = orig_lst[:]  # copy of the list
print solution_pepr(lst)

orig_lst = [random.randint(1, 1000000) for n in xrange(100000)]
print '\nrandom list of the length', len(orig_lst)
print orig_lst[:20] + ['...']   # to have some fun :)

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_ninjagecko1(lst)',
                  'from __main__ import solution_ninjagecko1, lst',
                  number=1)
print 'solution_ninjagecko1: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_jdi(lst)',
                  'from __main__ import solution_jdi, lst',
                  number=1)
print 'solution_jdi: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_Patrik(lst)',
                  'from __main__ import solution_Patrik, lst',
                  number=1)
print 'solution_Patrik: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_Patrik2(lst)',
                  'from __main__ import solution_Patrik2, lst',
                  number=1)
print 'solution_Patrik2: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_pepr_reversed(lst)',
                  'from __main__ import solution_pepr_reversed, lst',
                  number=1)
print 'solution_pepr_reversed: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_pepr(lst)',
                  'from __main__ import solution_pepr, lst',
                  number=1)
print 'solution_pepr: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_steveha(lst)',
                  'from __main__ import solution_steveha, lst',
                  number=1)
print 'solution_steveha: ', t

它打印在我的控制台上：

c:\tmp\_Python\Patrick\so10305762>python a.py
range() generated list of the length 30
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, '...']
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]

random list of the length 100000
[915411, 954538, 794388, 847204, 846603, 454132, 866165, 640004, 930488, 609138,
 333405, 986073, 318301, 728151, 996047, 117633, 455353, 581737, 55350, 485030,
'...']
solution_ninjagecko1:  2.41921752625
solution_jdi:  2.45477176569
solution_Patrik:  0.0468565138865
solution_Patrik2:  0.024270403082
solution_pepr_reversed:  2.43338888043
solution_pepr:  9.11879694207

所以，我尝试了更长的列表。只使用两倍的时间会产生很大的不同（在我的旧电脑上）。Patrik 的肮脏解决方案表现得非常好。它比反向解决方案快约 200 倍：

random list of the length 200000
[384592, 170167, 598270, 832363, 123557, 81804, 319315, 445945, 178732, 726600,
516835, 392267, 552608, 40807, 349215, 208111, 880032, 520614, 384119, 350090, 
'...']
solution_ninjagecko1:  17.362140719
solution_jdi:  17.86837545
solution_Patrik:  0.0957998851809
solution_Patrik2:  0.0500024444448
solution_pepr_reversed:  17.5078452708
solution_pepr:  52.175648581

[在 ninjagecko 评论后添加]

修正后的 Patrik2 解决方案比 2 阶段 Patrick 解决方案快大约两倍。

为了模拟不那么频繁地删除元素，测试 likeif v % 2 != 0:已更改为if v % 100 == 0:. 然后应该删除大约 1% 的项目。很明显，它需要更少的时间。对于列表中的 500,000 个随机整数，结果如下：

random list of the length 500000
[403512, 138135, 552313, 427971, 42358, 500926, 686944, 304889, 916659, 112636,
791585, 461948, 82622, 522768, 485408, 774048, 447505, 830220, 791421, 580706, 
'...']
solution_ninjagecko1:  6.79284210703
solution_jdi:  6.84066913532
solution_Patrik:  0.241242951269
solution_Patrik2:  0.162481823807
solution_pepr_reversed:  6.92106007886
solution_pepr:  7.12900522273

Patrick 的解决方案仍然快 30 倍左右。

[添加于 2012 年 4 月 25 日]

另一种就地工作的解决方案，向前循环，与帕特里克的解决方案一样快。删除元素时，它不会移动所有尾部。相反，它将想要的元素移动到它们的最终位置，然后切断列表中未使用的尾部。

def solution_pepr2(lst):
    i = 0
    for v in lst:
        lst[i] = v              # moving the element (sometimes unneccessary)
        if v % 100 != 0:        # simulation of the choice
            i += 1              # here will be the next one
    lst[i:] = []                # cutting the tail of the length of the skipped
    return lst

# The following one only adds the enumerate to simulate the situation when
# it is needed -- i.e. slightly slower but with the same complexity.        
def solution_pepr2enum(lst):
    i = 0
    for n, v in enumerate(lst):
        lst[i] = v              # moving the element (sometimes unneccessary)
        if v % 100 != 0:        # simulation of the choice
            i += 1              # here will be the next one
    lst[i:] = []                # cutting the tail of the length of the skipped
    return lst

与上述解决方案相比v % 100 != 0：

random list of the length 500000
[533094, 600755, 58260, 295962, 347612, 851487, 523927, 665648, 537403, 238660,
781030, 940052, 878919, 565870, 717745, 408465, 410781, 560173, 51010, 730322, 
'...']
solution_ninjagecko1:  1.38956896051
solution_jdi:  1.42314502685
solution_Patrik:  0.135545530079
solution_Patrik2:  0.0926935780151
solution_pepr_reversed:  1.43573239178
solution_steveha:  0.122824246805
solution_pepr2:  0.0938177241656
solution_pepr2enum:  0.11096263294

score 0 · Accepted Answer

处理这个问题的最好方法，最“Pythonic”的方法，实际上是循环你的列表并创建一个只包含你想要的文件夹的新列表。这是我的做法：

def want_folder(fname):
    if get_size(folder) >= byte_threshold:
        return True
    ans = raw_input(('The folder {0}/ is less than {1}MB.' + \
                ' Would you like to exclude it from' + \
                ' compression? ').format(folder, megabyte_threshold))
    return 'y' not in ans.strip().lower()

to_run_folders = [fname for fname in to_run_folders if want_folder(fname)]

如果您的列表确实很大，那么您可能需要担心此解决方案的性能并使用肮脏的技巧。但是，如果您的列表那么大，那么让人工回答有关可能显示的所有文件的是/否问题可能有点疯狂。

性能是一个实际问题还是只是一种烦人的担忧？因为我很确定上面的代码对于实际使用来说已经足够快了，而且它比复杂的代码更容易理解和修改。

编辑：@jdi 在评论中建议使用itertools.ifilter()或filter()

我测试过，这实际上应该比我上面显示的要快：

to_run_folders = filter(want_folder, to_run_folders)

我刚刚复制了@pepr 的基准测试代码，并使用filter()此处所示的方法测试了解决方案。它总体上是第二快的，只有 Patrik2 更快。Patrik2 的速度是原来的两倍，但同样，任何小到足以让人类回答“是/否”问题的数据集都可能足够小，以至于两倍不会有太大影响。

编辑：只是为了好玩，我继续写了一个纯列表理解的版本。它只有一个表达式来评估，没有 Python 函数调用。

to_run_folders = [fname for fname in to_run_folders
        if get_size(fname) >= mb_threshold or
                'y' not in raw_input(('The folder {0}/ is less than {1}MB.' +
                ' Would you like to exclude it from compression? '
                ).format(fname, mb_threshold)).strip().lower()

]

呸！我更喜欢制作一个功能。

python - 迭代列表时更改列表的最佳方法

5 回答 5

Related

Reference