python - 如何读取文件的前 N 行？

Question

我们有一个大的原始数据文件，我们希望将其修剪为指定的大小。

我将如何在 python 中获取文本文件的前 N 行？正在使用的操作系统会对实施产生任何影响吗？

score 300 · Accepted Answer

蟒蛇2：

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

蟒蛇 3：

with open("datafile") as myfile:
    head = [next(myfile) for x in range(N)]
print(head)

这是另一种方式（Python 2 & 3）：

from itertools import islice

with open("datafile") as myfile:
    head = list(islice(myfile, N))
print(head)

score 23 · Accepted Answer

N = 10
with open("file.txt", "a") as file:  # the a opens it in append mode
    for i in range(N):
        line = next(file).strip()
        print(line)

score 19 · Accepted Answer

如果您想快速阅读第一行并且您不关心性能，您可以使用.readlines()返回列表对象然后切片列表。

例如前 5 行：

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

注意：整个文件是读取的，所以从性能的角度来看并不是最好的，但它易于使用，快速编写且易于记忆，因此如果您只想执行一些一次性计算非常方便

print firstNlines

与其他答案相比，一个优点是可以轻松选择行的范围，例如跳过前 10 行[10:30]或最后10 行或[:-10]仅采用偶数行[::2]。

score 11 · Accepted Answer

我所做的是使用pandas. 我认为性能不是最好的，但例如如果N=1000：

import pandas as pd
yourfile = pd.read_csv('path/to/your/file.csv',nrows=1000)

score 7 · Accepted Answer

没有特定的方法可以读取文件对象公开的行数。

我想最简单的方法如下：

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

score 5 · Accepted Answer

两种最直观的方法是：

逐行和逐行迭代文件break。N
next()使用方法Ntimes逐行迭代文件。（这本质上只是最佳答案的不同语法。）

这是代码：

# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line

底线是，只要您不使用readlines()或enumerate将整个文件写入内存，您就有很多选择。

score 4 · Accepted Answer

基于 gnibbler 最高投票答案（2009 年 11 月 20 日 0:27）：此类将 head() 和 tail() 方法添加到文件对象。

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

用法：

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

score 3 · Accepted Answer

我自己最方便的方式：

LINE_COUNT = 3
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

基于列表理解的解决方案 open() 函数支持迭代接口。enumerate() 覆盖 open() 并返回元组（索引，项目），然后我们检查我们是否在可接受的范围内（如果 i < LINE_COUNT），然后简单地打印结果。

享受 Python。;)

score 3 · Accepted Answer

对于前 5 行，只需执行以下操作：

N=5
with open("data_file", "r") as file:
    for i in range(N):
       print file.next()

score 2 · Accepted Answer

如果您想要一些显然（无需在手册中查找深奥的东西）无需导入和 try/except 并且适用于相当范围的 Python 2.x 版本（2.2 到 2.6）的东西：

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        result.append(line)
        nlines += 1
        if nlines >= n:
            break
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)

score 2 · Accepted Answer

如果您有一个非常大的文件，并且假设您希望输出是一个 numpy 数组，则使用 np.genfromtxt 将冻结您的计算机。根据我的经验，这要好得多：

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    j=0        
    for line in f:
        if j==maxrows:
            break
        else:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
            j+=1
return np.vstack(rows)  # convert list of vectors to array

score 1 · Accepted Answer

这对我有用

f = open("history_export.csv", "r")
line= 5
for x in range(line):
    a = f.readline()
    print(a)

score 1 · Accepted Answer

我想通过读取整个文件来处理少于 n 行的文件

def head(filename: str, n: int):
    try:
        with open(filename) as f:
            head_lines = [next(f).rstrip() for x in range(n)]
    except StopIteration:
        with open(filename) as f:
            head_lines = f.read().splitlines()
    return head_lines

归功于 John La Rooy 和 Ilian Iliev。使用带有异常句柄的函数以获得最佳性能

修改 1：感谢 FrankM 的反馈，我们可以进一步添加处理文件存在和读取权限

import errno
import os

def head(filename: str, n: int):
    if not os.path.isfile(filename):
        raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), filename)  
    if not os.access(filename, os.R_OK):
        raise PermissionError(errno.EACCES, os.strerror(errno.EACCES), filename)     
   
    try:
        with open(filename) as f:
            head_lines = [next(f).rstrip() for x in range(n)]
    except StopIteration:
        with open(filename) as f:
            head_lines = f.read().splitlines()
    return head_lines

您可以使用第二个版本，也可以使用第一个版本并稍后处理文件异常。从性能的角度来看，检查速度很快，而且大部分时间都没有

score 1 · Accepted Answer

这是另一个带有列表理解的不错的解决方案：

file = open('file.txt', 'r')

lines = [next(file) for x in range(3)]  # first 3 lines will be in this list

file.close()

score 0 · Accepted Answer

从 Python 2.6 开始，您可以利用 IO 基类中更复杂的函数。所以上面评价最高的答案可以重写为：

    with open("datafile") as myfile:
       head = myfile.readlines(N)
    print head

（您不必担心文件少于 N 行，因为不会引发 StopIteration 异常。）

score 0 · Accepted Answer

这适用于 Python 2 和 3：

from itertools import islice

with open('/tmp/filename.txt') as inf:
    for line in islice(inf, N, N+M):
        print(line)

score 0 · Accepted Answer


fname = input("Enter file name: ")
num_lines = 0

with open(fname, 'r') as f: #lines count
    for line in f:
        num_lines += 1

num_lines_input = int (input("Enter line numbers: "))

if num_lines_input <= num_lines:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)

else:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)
        print("Don't have", num_lines_input, " lines print as much as you can")


print("Total lines in the text",num_lines)

score -2 · Accepted Answer

#!/usr/bin/python

import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output

这种方法对我有用

score -2 · Accepted Answer

只需使用list(file_data)将您的 CSV 文件对象转换为列表

import csv;
with open('your_csv_file.csv') as file_obj:
    file_data = csv.reader(file_obj);
    file_list = list(file_data)
    for row in file_list[:4]:
        print(row)

python - 如何读取文件的前 N ​​行？

19 回答 19

Related

Reference

python - 如何读取文件的前 N 行？