最近的一个类似问题(isinstance(foo, types.GeneratorType) 或 inspect.isgenerator(foo)?)让我对如何通用地实现它感到好奇。
实际上,拥有一个生成器类型的对象似乎是一件普遍有用的事情,它将第一次缓存(如itertools.cycle
),报告 StopIteration,然后在下一次从缓存中返回项目,但如果对象是'不是生成器(即固有地支持 O(1) 查找的列表或字典),然后不要缓存,并且具有相同的行为,但对于原始列表。
可能性:
1)修改itertools.cycle。它看起来像这样:
def cycle(iterable):
saved = []
try:
saved.append(iterable.next())
yield saved[-1]
isiter = True
except:
saved = iterable
isiter = False
# cycle('ABCD') --> A B C D A B C D A B C D ...
for element in iterable:
yield element
if isiter:
saved.append(element)
# ??? What next?
如果我可以重新启动生成器,那就太完美了——我可以发回一个 StopIteration,然后在下一个 gen.next() 上,返回条目 0,即“ABCD StopIteration ABCD StopIteration”,但看起来这实际上是不可能的.
其次是一旦 StopIteration 被击中,然后保存有一个缓存。但看起来没有任何方法可以访问内部 saved[] 字段。也许这是一个班级版本?
2)或者我可以直接传入列表:
def cycle(iterable, saved=[]):
saved.clear()
try:
saved.append(iterable.next())
yield saved[-1]
isiter = True
except:
saved = iterable
isiter = False
# cycle('ABCD') --> A B C D A B C D A B C D ...
for element in iterable:
yield element
if isiter:
saved.append(element)
mysaved = []
myiter = cycle(someiter, mysaved)
但这看起来很糟糕。在 C/++ 中,我可以传入一些引用,并将实际引用更改为已保存以指向可迭代 - 在 python 中实际上无法做到这一点。所以这甚至行不通。
其他选择?
编辑:更多数据。CachingIterable 方法似乎太慢而无法有效,但它确实将我推向了一个可行的方向。它比天真的方法(转换为列出我自己)稍慢,但如果它已经是可迭代的,它似乎不会受到影响。
一些代码和数据:
def cube_generator(max=100):
i = 0
while i < max:
yield i*i*i
i += 1
# Base case: use generator each time
%%timeit
cg = cube_generator(); [x for x in cg]
cg = cube_generator(); [x for x in cg]
cg = cube_generator(); [x for x in cg]
10000 loops, best of 3: 55.4 us per loop
# Fastest case: flatten to list, then iterate
%%timeit
cg = cube_generator()
cl = list(cg)
[x for x in cl]
[x for x in cl]
[x for x in cl]
10000 loops, best of 3: 27.4 us per loop
%%timeit
cg = cube_generator()
ci2 = CachingIterable(cg)
[x for x in ci2]
[x for x in ci2]
[x for x in ci2]
1000 loops, best of 3: 239 us per loop
# Another attempt, which is closer to the above
# Not exactly the original solution using next, but close enough i guess
class CacheGen(object):
def __init__(self, iterable):
if isinstance(iterable, (list, tuple, dict)):
self._myiter = iterable
else:
self._myiter = list(iterable)
def __iter__(self):
return self._myiter.__iter__()
def __contains__(self, key):
return self._myiter.__contains__(key)
def __getitem__(self, key):
return self._myiter.__getitem__(key)
%%timeit
cg = cube_generator()
ci = CacheGen(cg)
[x for x in ci]
[x for x in ci]
[x for x in ci]
10000 loops, best of 3: 30.5 us per loop
# But if you start with a list, it is faster
cg = cube_generator()
cl = list(cg)
%%timeit
[x for x in cl]
[x for x in cl]
[x for x in cl]
100000 loops, best of 3: 11.6 us per loop
%%timeit
ci = CacheGen(cl)
[x for x in ci]
[x for x in ci]
[x for x in ci]
100000 loops, best of 3: 13.5 us per loop
任何更快的食谱可以更接近“纯”循环?