ruby - 在 Ruby 中处理文件中的字符串和数组

Question

我有一个包含 8K+ 英文单词的文本文件（“dict.txt”）：

apple -- description text
angry -- description text
bear -- description text
...

我需要删除文件每一行上“--”之后的所有文本。

解决这个问题的最简单和最快的方法是什么？

score 1 · Accepted Answer

1

File.read("dict.txt").gsub(/(?<=--).*/, "")

输出

apple --
angry --
bear --
...

于 2013-10-30T15:44:37.110 回答

score 1 · Accepted Answer

lines_without_description = File.read('dict.txt').lines.map{|line| line[0..line.index('-')+1]}
File.open('dict2.txt', 'w'){|f| f.write(lines_without_description.join("\n"))}

score 1 · Accepted Answer

如果您想要速度，您可能需要考虑sed在命令行上执行此操作：

sed -r 's/(.*?) -- .*/\1/g' < dict.txt > new_dict.txt

这将创建一个new_dict.txt仅包含单词的新文件。

score 1 · Accepted Answer

从...开始：

words = [
  'apple -- description text',
  'angry -- description text',
  'bear -- description text',
]

如果你只想要前面的话--：

words.map{ |w| w.split(/\s-+\s/).first }  # => ["apple", "angry", "bear"]

或者：

words.map{ |w| w[/^(.+) --/, 1] } # => ["apple", "angry", "bear"]

如果你想要 AND 的话--：

words.map{ |w| w[/^(.+ --)/, 1] } # => ["apple --", "angry --", "bear --"]

如果目标是创建没有描述的文件版本：

File.open('new_dict.txt', 'w') do |fo|
  File.foreach('dict.txt') do |li|
    fo.puts li.split(/\s-+\s/).first
  end
end

通常，如果/当您的输入文件增长到很大比例时，为避免可伸缩性问题，请使用foreach迭代输入文件并将其作为单行处理。在逐行迭代或尝试将其全部吞入并作为缓冲区或数组处理时，就处理速度而言，这是一种洗涤。吞食一个巨大的文件会使机器慢下来，或者让你的代码崩溃，使其无限慢；逐行 IO 速度惊人，而且没有潜在的问题。

ruby - 在 Ruby 中处理文件中的字符串和数组

4 回答 4

Related

Reference