python - 如何从文件中读取特定行（按行号）？

Question

我正在使用for循环来读取文件，但我只想读取特定的行，比如 line#26和#30. 是否有任何内置功能可以实现这一点？

score 310 · Accepted Answer

如果要读取的文件很大，并且您不想一次读取内存中的整个文件：

fp = open("file")
for i, line in enumerate(fp):
    if i == 25:
        # 26th line
    elif i == 29:
        # 30th line
    elif i > 29:
        break
fp.close()

请注意，i == n-1对于n第行。

在 Python 2.6 或更高版本中：

with open("file") as fp:
    for i, line in enumerate(fp):
        if i == 25:
            # 26th line
        elif i == 29:
            # 30th line
        elif i > 29:
            break

score 189 · Accepted Answer

快速回答：

f=open('filename')
lines=f.readlines()
print lines[25]
print lines[29]

或者：

lines=[25, 29]
i=0
f=open('filename')
for line in f:
    if i in lines:
        print i
    i+=1

有一个更优雅的提取多行的解决方案：linecache（由"python: how to jump to a specific line in a large text file?"提供，之前的 stackoverflow.com 问题）。

引用上面链接的python文档：

>>> import linecache
>>> linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'

将更改4为所需的行号，然后就可以了。请注意， 4 将带来第五行，因为计数是从零开始的。

如果文件可能非常大，并且在读入内存时会导致问题，那么接受@Alok 的建议并使用 enumerate()可能是个好主意。

总结：

使用fileobject.readlines()或for line in fileobject作为小文件的快速解决方案。
用于linecache更优雅的解决方案，这对于读取许多文件将非常快，可能重复。
接受@Alok的建议并使用enumerate()可能非常大且不适合内存的文件。请注意，使用此方法可能会变慢，因为文件是按顺序读取的。

score 37 · Accepted Answer

为了提供另一种解决方案：

import linecache
linecache.getline('Sample.txt', Number_of_Line)

我希望这既快速又简单:)

score 34 · Accepted Answer

一种快速而紧凑的方法可能是：

def picklines(thefile, whatlines):
  return [x for i, x in enumerate(thefile) if i in whatlines]

这接受任何打开的类似文件的对象thefile（由调用者决定是否应该从磁盘文件打开它，或者通过例如套接字或其他类似文件的流打开）和一组从零开始的行索引whatlines，并返回一个列表，内存占用少，速度合理。如果要返回的行数很大，您可能更喜欢生成器：

def yieldlines(thefile, whatlines):
  return (x for i, x in enumerate(thefile) if i in whatlines)

这基本上只适用于循环 - 请注意，唯一的区别来自在return语句中使用圆括号而不是方括号，分别进行列表理解和生成器表达式。

进一步注意，尽管提到了“行”和“文件”，但这些函数更加通用——它们可以处理任何可迭代的文件，无论是打开的文件还是任何其他文件，返回项目列表（或生成器）基于他们的渐进项目编号。所以，我建议使用更合适的通用名称；-)。

score 15 · Accepted Answer

为了完整起见，这里还有另一种选择。

让我们从python 文档的定义开始：

slice一个对象，通常包含一个序列的一部分。切片是使用下标符号创建的，当给出多个数字时，[] 在数字之间使用冒号，例如在 variable_name[1:3:5] 中。括号（下标）表示法在内部使用切片对象（或在旧版本中，__getslice__() 和 __setslice__()）。

虽然切片符号通常不能直接应用于迭代器，但该itertools包包含一个替换函数：

from itertools import islice

# print the 100th line
with open('the_file') as lines:
    for line in islice(lines, 99, 100):
        print line

# print each third line until 100
with open('the_file') as lines:
    for line in islice(lines, 0, 100, 3):
        print line

该函数的另一个优点是它直到结束才读取迭代器。所以你可以做更复杂的事情：

with open('the_file') as lines:
    # print the first 100 lines
    for line in islice(lines, 100):
        print line

    # then skip the next 5
    for line in islice(lines, 5):
        pass

    # print the rest
    for line in lines:
        print line

并回答原来的问题：

# how to read lines #26 and #30
In [365]: list(islice(xrange(1,100), 25, 30, 4))
Out[365]: [26, 30]

score 14 · Accepted Answer

14

如果你想要第 7 行

line = open("file.txt", "r").readlines()[7]

于 2010-10-21T17:07:39.620 回答

score 12 · Accepted Answer

读取文件的速度令人难以置信。读取一个 100MB 的文件只需不到 0.1 秒（请参阅我的文章用 Python 读取和写入文件）。因此，您应该完整阅读它，然后使用单行。

这里的大多数答案都没有错，而是风格不好。应始终打开文件，with因为它确保文件再次关闭。

所以你应该这样做：

with open("path/to/file.txt") as f:
    lines = f.readlines()
print(lines[26])  # or whatever you want to do with this line
print(lines[30])  # or whatever you want to do with this line

巨大的文件

如果您碰巧有一个巨大的文件并且内存消耗是一个问题，您可以逐行处理它：

with open("path/to/file.txt") as f:
    for i, line in enumerate(f):
        pass  # process line i

score 10 · Accepted Answer

其中一些很可爱，但可以更简单地完成：

start = 0 # some starting index
end = 5000 # some ending index
filename = 'test.txt' # some file we want to use

with open(filename) as fh:
    data = fin.readlines()[start:end]

print(data)

这将使用简单的列表切片，它加载整个文件，但大多数系统会适当地最小化内存使用，它比上面给出的大多数方法更快，并且适用于我的 10G+ 数据文件。祝你好运！

score 6 · Accepted Answer

如果您的大文本文件file结构严格（意味着每一行都有相同的长度l），您可以使用 for n-th line

with open(file) as f:
    f.seek(n*l)
    line = f.readline() 
    last_pos = f.tell()

免责声明这仅适用于具有相同长度的文件！

score 5 · Accepted Answer

您可以进行seek()调用，将读取头定位到文件中的指定字节。除非您确切地知道在要读取的行之前在文件中写入了多少字节（字符），否则这对您没有帮助。也许您的文件是严格格式化的（每行是 X 字节数？）或者，如果您真的想要提高速度，您可以自己计算字符数（记住包括换行符等不可见字符）。

否则，您必须按照此处已经提出的众多解决方案之一阅读所需行之前的每一行。

score 4 · Accepted Answer

def getitems(iterable, items):
  items = list(items) # get a list from any iterable and make our own copy
                      # since we modify it
  if items:
    items.sort()
    for n, v in enumerate(iterable):
      if n == items[0]:
        yield v
        items.pop(0)
        if not items:
          break

print list(getitems(open("/usr/share/dict/words"), [25, 29]))
# ['Abelson\n', 'Abernathy\n']
# note that index 25 is the 26th item

score 4 · Accepted Answer

with open("test.txt", "r") as fp:
   lines = fp.readlines()
print(lines[3])

test.txt 是文件名
在 test.txt 中打印第四行

score 3 · Accepted Answer

这个怎么样：

>>> with open('a', 'r') as fin: lines = fin.readlines()
>>> for i, line in enumerate(lines):
      if i > 30: break
      if i == 26: dox()
      if i == 30: doy()

score 3 · Accepted Answer

如果您不介意导入，则fileinput完全符合您的需要（这是您可以读取当前行的行号）

score 3 · Accepted Answer

我更喜欢这种方法，因为它更通用，即您可以在文件、结果f.readlines()、StringIO对象上使用它，无论什么：

def read_specific_lines(file, lines_to_read):
   """file is any iterable; lines_to_read is an iterable containing int values"""
   lines = set(lines_to_read)
   last = max(lines)
   for n, line in enumerate(file):
      if n + 1 in lines:
          yield line
      if n + 1 > last:
          return

>>> with open(r'c:\temp\words.txt') as f:
        [s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']

score 3 · Accepted Answer

这是我的小 2 美分，物有所值；）

def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
    fp   = open(filename, "r")
    src  = fp.readlines()
    data = [(index, line) for index, line in enumerate(src) if index in lines]
    fp.close()
    return data


# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
    print "Line: %s\nData: %s\n" % (line[0], line[1])

score 3 · Accepted Answer

对 Alok Singhal 的回答进行了更好、更小的改动

fp = open("file")
for i, line in enumerate(fp,1):
    if i == 26:
        # 26th line
    elif i == 30:
        # 30th line
    elif i > 30:
        break
fp.close()

score 3 · Accepted Answer

您可以使用某人已经提到的语法非常简单地做到这一点，但这是迄今为止最简单的方法：

inputFile = open("lineNumbers.txt", "r")
lines = inputFile.readlines()
print (lines[0])
print (lines[2])

score 1 · Accepted Answer

文件对象有一个 .readlines() 方法，它将为您提供文件内容的列表，每个列表项一行。之后，您可以使用普通的列表切片技术。

http://docs.python.org/library/stdtypes.html#file.readlines

score 1 · Accepted Answer

@OP，您可以使用枚举

for n,line in enumerate(open("file")):
    if n+1 in [26,30]: # or n in [25,29] 
       print line.rstrip()

score 1 · Accepted Answer

file = '/path/to/file_to_be_read.txt'
with open(file) as f:
    print f.readlines()[26]
    print f.readlines()[30]

使用 with 语句，这将打开文件，打印第 26 和 30 行，然后关闭文件。简单的！

score 1 · Accepted Answer

要打印第 3 行，

line_number = 3

with open(filename,"r") as file:
current_line = 1
for line in file:
    if current_line == line_number:
        print(file.readline())
        break
    current_line += 1

原作者：弗兰克霍夫曼

score 1 · Accepted Answer

相当快而且切中要害。

打印文本文件中的某些行。创建一个“lines2print”列表，然后在枚举“在”lines2print 列表中时打印。要摆脱多余的 '\n'，请使用 line.strip() 或 line.strip('\n')。我只是喜欢“列表理解”，并尽可能地尝试使用。我喜欢使用“with”方法来读取文本文件，以防止因任何原因打开文件。

lines2print = [26,30] # can be a big list and order doesn't matter.

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in lines2print]

或者如果列表很小，只需将列表作为列表输入到理解中。

with open("filepath", 'r') as fp:
    [print(x.strip()) for ei,x in enumerate(fp) if ei in [26,30]]

score 0 · Accepted Answer

打印所需的行。在所需行的上方/下方打印行。

def dline(file,no,add_sub=0):
    tf=open(file)
    for sno,line in enumerate(tf):
        if sno==no-1+add_sub:
         print(line)
    tf.close()

执行---->dline("D:\dummy.txt",6) 即 dline("file path", line_number, 如果你想要搜索行的上一行给 1 下 -1 这是可选的默认值被采取 0)

score 0 · Accepted Answer

如果要读取特定行，例如在某个阈值行之后开始的行，则可以使用以下代码， file = open("files.txt","r") lines = file.readlines() ## convert to list of lines datas = lines[11:] ## raed the specific lines

score -1 · Accepted Answer

f = open(filename, 'r')
totalLines = len(f.readlines())
f.close()
f = open(filename, 'r')

lineno = 1
while lineno < totalLines:
    line = f.readline()

    if lineno == 26:
        doLine26Commmand(line)

    elif lineno == 30:
        doLine30Commmand(line)

    lineno += 1
f.close()

score -1 · Accepted Answer

我认为这会奏效

 open_file1 = open("E:\\test.txt",'r')
 read_it1 = open_file1.read()
 myline1 = []
 for line1 in read_it1.splitlines():
 myline1.append(line1)
 print myline1[0]

score -3 · Accepted Answer

从特定行读取：

n = 4   # for reading from 5th line
with open("write.txt",'r') as t:
     for i,line in enumerate(t):
         if i >= n:             # i == n-1 for nth line
            print(line)

python - 如何从文件中读取特定行（按行号）？

28 回答 28

巨大的文件

相当快而且切中要害。

Related

Reference