0

这里有一个有趣的。当我使用或生成器发现一些意外结果时,我实际上是在为另一个问题写答案。filter我有一个文件路径列表:

paths = ['/directoryb/baba.txt', '/directorya/nigel.txt', '/directoryb/ralph.txt', '/directorya/jim.txt'

我在路径列表中创建了一组不同的目录:

from os.path import dirname
dirs = {dirname(path) for path in paths}

现在我想制作一个生成器列表(甚至是生成器的生成器),每个生成器都包含paths同一目录中的元素。所以我这样做:

dirs_iter = [(path for path in paths if path.startswith(dir)) for dir in dirs]

跑完后我是不是很惊讶:

for dir_iter in dirs_iter:
    for path in dir_iter:
        print(path)

并获得以下内容:

/directorya/nigel.txt
/directorya/jim.txt
/directorya/nigel.txt
/directorya/jim.txt

这显然是错误的。然而,如果我使用以下句子:

# now I'm generating the lists instead of using generators
dirs_iter = [[path for path in paths if path.startswith(dir)] for dir in dirs]

打印循环显示预期的答案:

/directoryb/baba.txt
/directoryb/ralph.txt
/directorya/nigel.txt
/directorya/jim.txt

如果我使用filter和/或map代替生成器:

dirs_iter = map(lambda dir: filter(lambda path: path.startswith(dir), paths), dirs)

我也得到了错误的答案 编辑: /map版本filter确实有效。

这里发生了什么?

4

1 回答 1

2

该名称是一个闭包,在执行生成器时dir查找,而不是在定义它时查找。到那时,最后一次绑定到 中的最后一个值:dirdirs

>>> from os.path import dirname
>>> paths = ['/directoryb/baba.txt', '/directorya/nigel.txt', '/directoryb/ralph.txt', '/directorya/jim.txt']
>>> dirs = {dirname(path) for path in paths}
>>> def echo(value):
...     print('echoing:', value)
...     return value
... 
>>> dirs_iter = [(path for path in paths if path.startswith(echo(dir))) for dir in dirs]
>>> for dir_iter in dirs_iter:
...     print('Iterating over the next dir_iter generator')
...     for path in dir_iter:
...         print(path)
... 
Iterating over the next dir_iter generator
echoing: /directoryb
/directoryb/baba.txt
echoing: /directoryb
echoing: /directoryb
/directoryb/ralph.txt
echoing: /directoryb
Iterating over the next dir_iter generator
echoing: /directoryb
/directoryb/baba.txt
echoing: /directoryb
echoing: /directoryb
/directoryb/ralph.txt
echoing: /directoryb
>>> list(dirs)
['/directorya', '/directoryb']

因为 Python 3 使用随机哈希种子,所以在我的运行/directoryb中是 last 而不是/directorya,但是您可以看到,只有当我们实际迭代生成dir_iter器时dir才会访问(并回显)该值,并且当时它被设置为一个价值。该list(dirs)行显示了dirs集合以什么顺序产生其值。

注意filter()存在这个问题;你的map()filter()组合工作得很好:

>>> dirs_iter = map(lambda dir: filter(lambda path: path.startswith(dir), paths), dirs)
>>> for dir_iter in dirs_iter:
...     for path in dir_iter:
...         print(path)
... 
/directorya/nigel.txt
/directorya/jim.txt
/directoryb/baba.txt
/directoryb/ralph.txt
于 2014-10-22T18:03:52.193 回答