python - Python：itertools.islice 不在循环中工作

Question

我有这样的代码：

#opened file f
goto_line = num_lines #Total number of lines
while not found:
   line_str = next(itertools.islice(f, goto_line - 1, goto_line))
   goto_line = goto_line/2
   #checks for data, sets found to True if needed

line_str 第一遍是正确的，但之后的每一遍都在读取不同的行。

例如，goto_line 从 1000 开始。它读取第 1000 行就好了。然后下一个循环，goto_line 是 500，但它不读取第 500 行。它读取接近 1000 的一些行。

我正在尝试读取大文件中的特定行，而无需阅读更多内容。有时它向后跳到一条线，有时向前跳。

我确实尝试过 linecache，但我通常不会在同一个文件上多次运行此代码。

score 5 · Accepted Answer

Python 迭代器只能使用一次。这是最容易通过示例看到的。以下代码

from itertools import islice
a = range(10)
i = iter(a)
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))

印刷

[1, 2]
[4, 5]
[7, 8]
[]

切片总是从我们上次停止的地方开始。

使代码正常工作的最简单方法是使用f.readlines()获取文件中行的列表，然后使用普通的 Python 列表切片[i:j]。如果你真的要使用islice()，你可以使用每次从头开始读取文件f.seek(0)，但这会非常低效。

score 0 · Accepted Answer

您不能（这种方式 - 可能有某种方式取决于文件的打开方式）返回文件。标准文件迭代器（事实上，大多数迭代器——Python 的迭代器协议只支持前向迭代器）只向前移动。因此，在读取k行之后，读取另一k/2行实际上给出了k+k/2第 th 行。

You could try reading the whole file into memory, but you have a lot of data so memory consumption propably becomes an issue. You could use file.seek to scroll through the file. But that's still a lot of work - perhaps you could use a memory-mapped file? That's only possible if lines are fixed-size though. If it's necessary, you could pre-calculate the line numbers you'd like to check and save all those lines (shouldn't be too much, roughly int(log_2(line_count)) + 1 if I'm not mistaken) in one iteration so you don't have to scroll back after reading the whole file.

python - Python：itertools.islice 不在循环中工作

2 回答 2

Related

Reference