python - 循环遍历文件并跳过数据并在此之后读取会出错

Question

我有多个目录和每个目录中的许多文件，我想遍历它们中的每一个。我还想只读取5th每个文件的行，因此忽略前四行。当我运行脚本而不忽略尝试忽略第一4行时，它运行良好。这是代码：

import os

#find the present working directory
pwd=os.path.dirname(os.path.abspath(__file__))

#find all the folders in the present working directory.
dirs = [f for f in os.listdir('.') if os.path.isdir(f)]

for directory in dirs:
        os.chdir(os.path.join(pwd, directory));
        chd_dir = os.path.dirname(os.path.abspath(__file__))
        files = [ fl for fl in os.listdir('.') if os.path.isfile(fl) ]
        print files
        for f in files:
                f_obj = open(os.path.join(chd_dir, f), 'r')
                for i in xrange(0,4): #ignore the first 4 lines
                        f_obj.next()
                s=f_obj.readline()
                print s
                f_obj.close()

此脚本给出以下错误： ValueError: Mixing iteration and read methods would lose data

我不明白为什么 python 认为我会丢失一些数据，我也想知道修复它的工作以及为什么修复它。

score 2 · Accepted Answer

您可以使用以下方法重复第 5 行.next()：

s = f_obj.next()

文件迭代方法使用缓冲来保持效率，并且该缓冲区不与.readline()文件对象的 the 和其他读取方法共享。因此，在混合迭代和读取方法时，您会错过数据。

从.next()方法文档：

为了使 for 循环成为循环文件行的最有效方式（一种非常常见的操作），该next()方法使用隐藏的预读缓冲区。作为使用预读缓冲区的结果，next()与其他文件方法（如readline()）组合不能正常工作。

您也可以用呼叫替换.next()呼叫.readline()，只要保持一致并使用其中一个即可。

python - 循环遍历文件并跳过数据并在此之后读取会出错

1 回答 1

Related

Reference