ruby - 从 Ruby 中的 CSV 文件中获取标头的最简单方法是什么？

Question

我需要做的就是从 CSV 文件中获取标题。

file.csv 是：

"A", "B", "C"  
"1", "2", "3"

我的代码是：

table = CSV.open("file.csv", :headers => true)

puts table.headers

table.each do |row|
  puts row 
end

这给了我：

true
"1", "2", "3"

我已经看了好几个小时的 Ruby CSV 文档，这让我发疯。我确信必须有一个简单的单行可以将标题返回给我。有任何想法吗？

score 18 · Accepted Answer

看起来CSV.read会让您访问一种headers方法：

headers = CSV.read("file.csv", headers: true).headers
# => ["A", "B", "C"]

以上实际上只是CSV.open("file.csv", headers: true).read.headers. 您可能已经CSV.open尝试过使用它，但是由于CSV.open在调用该方法时实际上并没有读取文件，因此在实际读取一些数据之前，它无法知道标头是什么。这就是为什么它只是true在您的示例中返回。在读取一些数据后，它最终会返回标题：

  table = CSV.open("file.csv", :headers => true)
  table.headers
  # => true
  table.read
  # => #<CSV::Table mode:col_or_row row_count:2>
  table.headers
  # => ["A", "B", "C"]

score 15 · Accepted Answer

在我看来，最好的方法是：

headers = CSV.foreach('file.csv').first

请注意，它使用起来非常诱人，CSV.read('file.csv'. headers: true).headers但问题是，CSV.read将完整的文件加载到内存中，从而增加了内存占用，而且对于更大的文件使用起来也很慢。请尽可能使用CSV.foreach. 以下是仅 20 MB 文件的基准测试：

Ruby version: ruby 2.4.1p111 
File size: 20M  
****************
Time and memory usage with CSV.foreach:
Time: 0.0 seconds
Memory: 0.04 MB
****************
Time and memory usage with CSV.read:
Time: 5.88 seconds
Memory: 314.25 MB

一个 20MB 的文件会增加 314 MB 的内存占用CSV.read，想象一下 1GB 的文件会对您的系统造成什么影响。简而言之，请不要使用CSV.read，我使用了，系统崩溃了一个 300MB 的文件。

进一步阅读：如果您想了解更多相关信息，这里有一篇非常好的关于处理大文件的文章。

下面也是我用于基准测试的脚本CSV.foreach和CSV.read：

require 'benchmark'
require 'csv'
def print_memory_usage
  memory_before = `ps -o rss= -p #{Process.pid}`.to_i
  yield
  memory_after = `ps -o rss= -p #{Process.pid}`.to_i
  puts "Memory: #{((memory_after - memory_before) / 1024.0).round(2)} MB"
end

def print_time_spent
  time = Benchmark.realtime do
    yield
  end
  puts "Time: #{time.round(2)} seconds"
end

file_path = '{path_to_csv_file}'
puts 'Ruby version: ' + `ruby -v`
puts 'File size:' + `du -h #{file_path}`
puts 'Time and memory usage with CSV.foreach: '
print_memory_usage do
  print_time_spent do
    headers = CSV.foreach(file_path, headers: false).first
  end
end
puts 'Time and memory usage with CSV.read:'
print_memory_usage do
  print_time_spent do
    headers = CSV.read(file_path, headers: true).headers
  end
end

score 2 · Accepted Answer

如果您想要更短的答案，那么可以尝试：

headers = CSV.open("file.csv", &:readline)
# => ["A", "B", "C"]

ruby - 从 Ruby 中的 CSV 文件中获取标头的最简单方法是什么？

3 回答 3

Related

Reference