从其他几个 解决方案中汲取灵感,但增加了一个转折......
>>> with open('lines.txt', 'r') as lines:
... chunks = iter(lambda: list(itertools.islice(lines, 7)), [])
... for chunk in chunks:
... print chunk
...
['0\n', '1\n', '2\n', '3\n', '4\n', '5\n', '6\n']
['7\n', '8\n', '9\n', '10\n', '11\n', '12\n', '13\n']
['14\n', '15\n', '16\n', '17\n', '18\n', '19\n', '20\n']
['21\n', '22\n', '23\n', '24\n', '25\n', '26\n', '27\n']
['28\n', '29\n', '30\n', '31\n', '32\n', '33\n', '34\n']
['35\n', '36\n', '37\n', '38\n', '39\n', '40\n', '41\n']
['42\n', '43\n', '44\n', '45\n', '46\n', '47\n', '48\n']
['49\n', '50\n', '51\n', '52\n', '53\n', '54\n', '55\n']
['56\n', '57\n', '58\n', '59\n', '60\n', '61\n', '62\n']
['63\n', '64\n', '65\n', '66\n', '67\n', '68\n', '69\n']
['70\n', '71\n', '72\n', '73\n', '74\n', '75\n', '76\n']
['77\n', '78\n', '79\n', '80\n', '81\n', '82\n', '83\n']
['84\n', '85\n', '86\n', '87\n', '88\n', '89\n', '90\n']
['91\n', '92\n', '93\n', '94\n', '95\n', '96\n', '97\n']
['98\n', '99\n']
但在这里我必须承认,正如其他人所说,使用readlines
字节提示会快一点,只要你不需要正好10000 行(或每次 10000 行)。但是,我不相信这是因为它的读取次数较少。文档字符串说“readlines
反复调用 readline() 并返回如此读取的行列表”。所以我认为速度增益来自于减少少量的迭代器开销。定义(使用 Marcin 的代码):
def do_nothing_islice(filename, nlines):
with open(filename, 'r') as lines:
chunks = iter(lambda: list(itertools.islice(lines, nlines)), [])
for chunk in chunks:
chunk
def do_nothing_readlines(filename, nbytes):
with open(filename, 'r') as lines:
while True:
bytes_lines = lines.readlines(nbytes)
if not bytes_lines:
break
bytes_lines
测试:
>>> %timeit do_nothing_islice('lines.txt', 1000)
10 loops, best of 3: 63.6 ms per loop
>>> %timeit do_nothing_readlines('lines.txt', 7000) # 7-byte lines, ish
10 loops, best of 3: 56.8 ms per loop
>>> %timeit do_nothing_islice('lines.txt', 10000)
10 loops, best of 3: 58.4 ms per loop
>>> %timeit do_nothing_readlines('lines.txt', 70000) # 7-byte lines, ish
10 loops, best of 3: 50.7 ms per loop
>>> %timeit do_nothing_islice('lines.txt', 100000)
10 loops, best of 3: 76.1 ms per loop
>>> %timeit do_nothing_readlines('lines.txt', 700000) # 7-byte lines, ish
10 loops, best of 3: 70.1 ms per loop
在平均行长为 7(0 -> 1000000 逐行打印)的文件上,使用readlines
大小提示会快一点。但只有一点。还要注意奇怪的缩放——我不明白那里发生了什么。