python - 为什么我只能使用一次阅读器对象？

Question

我正在尝试使用csv模块将一列数字读入python。我得到以下行为：

import csv

f=open('myfile.txt','r')
reader=csv.reader(f)
print [x for x in reader] #  This outputs the contents of "myfile.txt",
                          #  broken up by line.
print [x for x in reader] #  This line prints an empty list.

为什么会这样？阅读器对象只能使用一次是否有某种原因？

score 3 · Accepted Answer

这里的原因相同：

>>> li=[1,2,3,4,5,6,7,8,9]
>>> it=iter(li)
>>> print [x for x in it], [x for x in it]
[1, 2, 3, 4, 5, 6, 7, 8, 9], []

注意空列表...

csv.reader 是一个迭代器，它从容器或序列中逐个生成项目，直到StopIteration异常指示没有更多项目。

对于内置类型（以及我知道的所有库类型，如 csv），迭代是一种方法，“返回”的唯一方法是保留您感兴趣的项目或重新创建迭代器。

我想你可以通过向后搜索来破解/愚弄 csv.reader，但为什么要这样做呢？

如果需要，您可以制作迭代器的副本：

>>> it_copy=list(it)
>>> print [x for x in it_copy],[x for x in it_copy]
[1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3, 4, 5, 6, 7, 8, 9]

或者使用itertools.tee作为 Mark Ransom 的注释。

最好的办法是围绕通过迭代器的单程旅行来设计您的算法。更少的内存，通常更快。

score 2 · Accepted Answer

您只能采用一种方式的原因是因为您传递的文件只能采用一种方式，如果您想再次遍历 csv 文件，您可以执行类似的操作

>>> with open("output.csv", 'r') as f:
    r = csv.reader(f)
    for l in r:
        print l
    f.seek(0)
    for l in r:
        print l

这是一个非常糟糕的解释，不幸的是我不知道这个词only goes one way，也许其他人可以帮助我提高我的词汇量......

score 1 · Accepted Answer

reader 对象是一个迭代器，根据定义，迭代器对象只能使用一次。当他们完成迭代时，您将无法再从他们那里得到更多信息。

您可以使用itertools.tee将迭代器拆分为两个副本，每个副本都可以独立使用并返回相同的数据。如果您不同时使用两个副本，不幸的是，这将导致副本存储在内存中，并且您可能会耗尽内存。

import csv
import itertools

f=open('myfile.txt', 'r')
reader = csv.reader(f)
reader1, reader2 = itertools.tee(reader)
print [x for x in reader1] #  This outputs the contents of "myfile.txt"
print [x for x in reader2] #  This line prints the same thing.

score 1 · Accepted Answer

当您阅读时，您正在一一获取行。完成阅读后，您将位于文件的末尾。您应该将文件对象的读取位置重置为它的乞求。

f.seek(0)
print [x for x in reader]

python - 为什么我只能使用一次阅读器对象？

4 回答 4

Related

Reference