0

I'm attempting to read in a file via ARGF and perform some operations on it. Before doing anything else I need to read in a specific line and validate its contents.

I'm calling my script like so:

./main.rb < input.txt

I'm attempting to access the specific line (lets say line 10) like so:

if __FILE__ == $0

    ARGF.lineno= 10
    puts "lineno: #{ARGF.lineno}" # Prints 10 (as expected)
    puts "readline: #{ARGF.readline}" # Prints contents of line 0 instead of 10!

end

I am able to manually set ARGF.lineno= per the docs, and this seems to work. However when I then attempt to read the line I just set, I get the contents of line 0. What, if anything, am I doing wrong?

Note that looping through the lines in order to get to the given line is not an option, my input data may be hundreds of thousands of lines long.

Thanks in advance for any help.

4

2 回答 2

3

如果您查看该lineno=方法的源代码,您会发现它不会以任何方式影响输入流——它只是用给定的值覆盖自动行号。如果你想跳到某一行,你需要编写自己的方法。

请注意,文件存储为字节序列,而不是行。要跳到特定行,您需要扫描文件以查找行分隔符。

例如:

def ARGF.skip_lines num
  enum = each_line
  num.times { enum.next }
  self
end

我用一个有 600,000 行的 36M 文件对此进行了测试,它可以在大约 1 秒内从第一行跳到最后一行。

如果您可以控制输入格式,则可以将每一行填充到特定长度,然后用于IO#seek跳转到某个特定长度。但这还有其他缺点。

于 2013-10-15T14:38:17.137 回答
1

您想使用pos=访问器:lineno=根据文档似乎没有做任何事情。

pos=将跳转到一个字节偏移量,所以你必须有一个固定的行长才能做到这一点。

当您考虑它时,这是有道理的:流无法告诉它尚未读取的文件的每一行上有多少字节。

于 2013-10-15T14:45:49.343 回答