ruby - 如何在 Ruby 中有效地读取文件的第 n 行？

Question

我有一个 2 GiB 文件，我想读取文件的第一行。我可以调用File#readlines返回数组的方法，并使用[0]括号语法、at(0)或slice(0)orfirst方法。

但是有一个问题。我的 PC 有 3.7 GiB RAM，使用量从 1.1 GiB 一直到 3.7 GiB。但我想要的只是文件的第一行。有没有一种有效的方法来做到这一点？

score 0 · Accepted Answer

0

你试过readline代替readlines吗？

File.open('file-name') { |f| f.readline }

于 2019-08-04T23:29:28.260 回答

score 0 · Accepted Answer

怎么样IO.foreach？

IO.foreach('filename') { |line| p line; break }

那应该读取第一行，打印它，然后停止。它不会读取整个文件；它一次读取一行。

score 0 · Accepted Answer

我会使用命令行。例如，以这种方式：

exec("cat #{filename} | head -#{nth_line} | tail -1")

我希望它对你有用。

score 0 · Accepted Answer

所以我提供了一个可以非常有效地完成这项工作的代码。

首先，我们可以使用该`IO#each_line`方法。假设我们需要 3,000,000 处的线：

#!/usr/bin/ruby -w

file = File.open(File.join(__dir__, 'hello.txt'))
final = nil
read_upto = 3_000_000 - 1

file.each_line.with_index do |l, i|
    if i == read_upto
        final = l
        break
    end
end

file.close
p final

time使用内置的 shell运行：

[我有一个带有#!/usr/bin/ruby -w #lineno 的大hello.txt 文件！！]

$ time ruby p.rb
"#!/usr/bin/ruby -w #3000000\n"

real    0m1.298s
user    0m1.240s
sys 0m0.043s

我们也可以很轻松的拿到第一行！你说对了...

其次，扩展anothermh的答案：

#!/usr/bin/ruby -w

enum = IO.foreach(File.join(__dir__, 'hello.txt'))

# Getting the first line
p enum.first

# Getting the 100th line
# This can still cause memory issues because it
# creates an array out of each line
p enum.take(100)[-1]

# The time consuming but memory efficient way
# reading the 3,000,000th line
# While loops are fastest

index, i = 3_000_000 - 1, 0
enum.next && i += 1 while i < index
p enum.next    # reading the 3,000,000th line

运行time：

time ruby p.rb 
"#!/usr/bin/ruby -w #1\n"
"#!/usr/bin/ruby -w #100\n"
"#!/usr/bin/ruby -w #3000000\n"

real    0m2.341s
user    0m2.274s
sys 0m0.050s

可能还有其他方式，例如IO#readpartial,IO#sysread等等。但是IO.foreach, 和IO#each_line是最容易和相当快的使用。

希望这可以帮助！

score 0 · Accepted Answer

从https://www.rosettacode.org/wiki/Read_a_specific_line_from_a_file#Ruby获取

 seventh_line = open("/etc/passwd").each_line.take(7).last

ruby - 如何在 Ruby 中有效地读取文件的第 n 行？

5 回答 5

首先，我们可以使用该IO#each_line方法。假设我们需要 3,000,000 处的线：

其次，扩展anothermh的答案：

Related

Reference

首先，我们可以使用该`IO#each_line`方法。假设我们需要 3,000,000 处的线：