我使用这样的函数来解决类似的问题。你可以用它来包装任何可迭代的东西。
改变这个
for one_line in f.readlines():
您只需要将代码更改为
# don't use readlines, it creates a big list of all data in memory rather than
# iterating one line at a time.
for one_line in in progress_meter(f, 10000):
您可能希望根据要浪费打印状态消息的时间来选择更小或更大的值。
def progress_meter(iterable, chunksize):
""" Prints progress through iterable at chunksize intervals."""
scan_start = time.time()
since_last = time.time()
for idx, val in enumerate(iterable):
if idx % chunksize == 0 and idx > 0:
print idx
print 'avg rate', idx / (time.time() - scan_start)
print 'inst rate', chunksize / (time.time() - since_last)
since_last = time.time()
print
yield val