python - Python islice 正在读取相同的行

Question

我有一个应该分析的大日志文件（> 1GB），所以我写了一个 python 程序。我已经使用过islice，所以我可以分块（10,000 行）读取文件，这样我的服务器就不会耗尽内存。

我已经在 stackoverflow 上查找了一些islice解决方案并实现了一个，但该程序无法按预期工作，因为 isclice 每次都读取相同的行（但在读取整个文件后正确停止......）。我不能使用with open，因为它带有 python 2.5，我有 python 2.4...

我的代码如下所示：

    n = 100000;     # n lines
    inf = open(fn, "r")
    while True:
        next_n_lines = list(islice(inf, n))
        if not next_n_lines:
            break
        out_fn = produce_clean_logfile(next_n_lines)
        a, t = main(out_fn)
        send_log(a,t)

你知道出了什么问题吗？

提前致谢。问候，约翰。

score 3 · Accepted Answer

from itertools import islice
n = 2;     # n lines
fn = "myfile"
inf = open(fn, "r")
while True:
    next_n_lines = list(islice(inf, n))
    if not next_n_lines:
        break
    print next_n_lines

在 python 2.5、2.6、2.7 上为我工作 => 我可以看到按顺序显示的行。

该错误肯定来自您的其他功能，您能更新您的问题吗？

score 2 · Accepted Answer

您可以为此使用groupby

from itertools import groupby, count
with open(filename, 'r') as datafile:
    groups = groupby(datafile, key=lambda k, line=count(): next(line)//10000)
    for k, group in groups:
        for line in group:
            ...

python - Python islice 正在读取相同的行

2 回答 2

Related

Reference