我正在寻找一种“翻页”Python 迭代器的方法。也就是说,我想用另一个迭代器包装一个给定的迭代器iter和page_size,它会将来自 iter 的项目作为一系列“页面”返回。每个页面本身都是一个迭代器,最多可进行page_size次迭代。
我查看了itertools,我看到的最接近的是itertools.islice。在某些方面,我想要的是与itertools.chain相反——而不是将一系列迭代器链接到一个迭代器中,我想将一个迭代器分解为一系列较小的迭代器。我期待在 itertools 中找到分页功能,但找不到。
我想出了以下寻呼机类和演示。
class pager(object):
"""
takes the iterable iter and page_size to create an iterator that "pages through" iter. That is, pager returns a series of page iterators,
each returning up to page_size items from iter.
"""
def __init__(self,iter, page_size):
self.iter = iter
self.page_size = page_size
def __iter__(self):
return self
def next(self):
# if self.iter has not been exhausted, return the next slice
# I'm using a technique from
# https://stackoverflow.com/questions/1264319/need-to-add-an-element-at-the-start-of-an-iterator-in-python
# to check for iterator completion by cloning self.iter into 3 copies:
# 1) self.iter gets advanced to the next page
# 2) peek is used to check on whether self.iter is done
# 3) iter_for_return is to create an independent page of the iterator to be used by caller of pager
self.iter, peek, iter_for_return = itertools.tee(self.iter, 3)
try:
next_v = next(peek)
except StopIteration: # catch the exception and then raise it
raise StopIteration
else:
# consume the page from the iterator so that the next page is up in the next iteration
# is there a better way to do this?
#
for i in itertools.islice(self.iter,self.page_size): pass
return itertools.islice(iter_for_return,self.page_size)
iterator_size = 10
page_size = 3
my_pager = pager(xrange(iterator_size),page_size)
# skip a page, then print out rest, and then show the first page
page1 = my_pager.next()
for page in my_pager:
for i in page:
print i
print "----"
print "skipped first page: " , list(page1)
我正在寻找一些反馈并有以下问题:
- itertools中是否已经有一个寻呼机,它为我忽略的寻呼机提供服务?
- 克隆 self.iter 3 次对我来说似乎很笨拙。一个克隆是检查 self.iter 是否还有更多项目。我决定采用 Alex Martelli 建议的一种技术(知道他写了一种包装技术)。第二个克隆是使返回的页面独立于内部迭代器(self.iter)。有没有办法避免制作 3 个克隆?
- 除了捕获它然后再次引发它之外,还有更好的方法来处理StopIteration异常吗?我很想根本不抓住它,让它冒泡。
谢谢!-雷蒙德