ruby - 通过操作行和列在 ruby 中重新格式化 csv

Question

我必须使用一个不能直接用于生成简单图表的 csv 文件。我需要将文件操作为“更干净”的东西并且遇到问题并且不确定我的整体策略是否正确，因为我只是在学习用 ruby 解析文件......我这里的问题主要与我寻找从我找到或未找到匹配项的位置偏移的数据。在找到符合标准的行后，我需要从其后的 2 行中读取信息并对其进行操作（将某些内容从最后一列移到第二列）。

这是原始的 csv 文件：

component
quantity header,design1,design2,design3,Ref,Units
quantity type,#,#,#,ref#,unit value
component
quantity header,design1,design2,design3,Ref,Units
quantity type,#,#,#,ref#,unit value
component
quantity header,design1,design2,design3,Ref,Units
quantity type,#,#,#,ref#,unit value

期望的输出：

Component Header,Quantity type Header,Units Header,design1 header,design2 header,design3 header,Ref header
component,quantity type,unit value,#,#,#,n/a
component,quantity type,unit value,#,#,#,n/a
component,quantity type,unit value,#,#,#,n/a
component,quantity type,unit value,#,#,#,n/a
component,quantity type,unit value,#,#,#,n/a

我目前的红宝石脚本：

require 'csv'
f = File.new("sp.csv")
o = CSV.open('output.csv', 'w')

f.each_line do |l| #iterate through each line
    data = l.split
    if l !~ /,/ #if the line does not contain a comma it is a component
        o << [data,f.gets] #start writing data, f.gets skips next line but need to skip 2 and split the line to manipulate columns
    else
        o << ['comma'] #just me testing that I can find lines with commas
    end
end

f.gets 跳过下一行，文档不清楚如何使用它来跳过 2。之后我认为我可以用逗号分隔该行并使用数组 [column] 操作行数据。除了这个抵消问题，我也不确定我的一般方法是否是一个好策略

编辑

这是真实文件中的一些行...。我将研究提供的答案，看看是否可以使所有工作正常进行。我的想法是逐行读取和写入，而不是将整个文件转换为数组然后读取和写入。我的想法是，当这些文件变大时，它们会占用更少的内存。

感谢您的帮助，我将解决答案并回复您。

DCB
Result Quantity,BL::BL,BL::BL_DCB-noHeat,DC1::DC1,DC2::DC2,noHS::noHS,20mmHS::20mmHS,Reference,Units
Avg Temperature,82.915,69.226,78.35,78.383,86.6,85.763,N/A,Celsius
RCB
Result Quantity,BL::BL,BL::BL_DCB-noHeat,DC1::DC1,DC2::DC2,noHS::noHS,20mmHS::20mmHS,Reference,Units
Avg Temperature,76.557,68.779,74.705,74.739,80.22,79.397,N/A,Celsius
Antenna
Result Quantity,BL::BL,BL::BL_DCB-noHeat,DC1::DC1,DC2::DC2,noHS::noHS,20mmHS::20mmHS,Reference,Units
Avg Temperature,69.988,65.045,69.203,69.238,73.567,72.777,N/A,Celsius
PCBA_fiberTray
Result Quantity,BL::BL,BL::BL_DCB-noHeat,DC1::DC1,DC2::DC2,noHS::noHS,20mmHS::20mmHS,Reference,Units
Avg Temperature,66.651,65.904,66.513,66.551,72.516,70.47,N/A,Celsius

编辑 2

使用下面答案中的一些正则表达式，我开发了一种逐行策略来解析它。为了完整起见，我将其发布为答案。

感谢您提供帮助并让我了解开发解决方案的方法

score 2 · Accepted Answer

如何将其切成 3 行组：

File.read("sp.csv").split("\n").each_slice(3) do |slice|
  o << [slice[0], *slice[2].split(',')]
end

score 1 · Accepted Answer

我根据示例创建了一个名为“test.csv”的 CSV 文件。

从此代码开始：

data = File.readlines('test.csv').slice_before(/^component/)

我得到了一个枚举器。如果我查看枚举器将返回的数据，我会得到：

pp data.to_a

[["component\n",
  "quantity header,design1,design2,design3,Ref,Units\n",
  "quantity type,#,#,#,ref#,unit value\n"],
["component\n",
  "quantity header,design1,design2,design3,Ref,Units\n",
  "quantity type,#,#,#,ref#,unit value\n"],
["component\n",
  "quantity header,design1,design2,design3,Ref,Units\n",
  "quantity type,#,#,#,ref#,unit value\n"]]

那是一个数组数组，在“组件”行上分成子数组。我怀疑这些值并没有反映现实，但没有更准确的样本......好吧，GIGO。

如果“组件”行实际上不是一堆重复的“组件”行，并且没有任何逗号，则可以改用它：

data = File.readlines('test.csv').slice_before(/\A[^,]+\Z/)

或者：

data = File.readlines('test.csv').slice_before(/^[^,]+$/)

结果将与当前样本相同。

如果您需要更复杂的正则表达式，您可以替换它，例如：

/^(?:#{ Regexp.union(%w[component1 component2]).source })$/i

它返回一个模式，该模式将在%w[]数组中找到任何单词：

/^(?:component1|component2)$/i

从那里我们可以遍历data数组并使用以下方法清除所有无关的标头：

data.map{ |a| a[2..-1] }.flatten

它返回如下内容：

[
  "quantity type,#,#,#,ref#,unit value\n",
  "quantity type,#,#,#,ref#,unit value\n",
  "quantity type,#,#,#,ref#,unit value\n"
]

如果需要，可以对其进行迭代并传递给 CSV 以解析为数组：

data.map{ |a| a[2..-1].map{ |r| CSV.parse(r) }.flatten }

[
  ["quantity type", "#", "#", "#", "ref#", "unit value"],
  ["quantity type", "#", "#", "#", "ref#", "unit value"],
  ["quantity type", "#", "#", "#", "ref#", "unit value"]
]

这就是让您思考如何拆分 CSV 数据的所有背景知识。

使用此代码：

data.flat_map { |ary|
  component = ary[0].strip
  ary[2..-1].map{ |a|
    data = CSV.parse(a).flatten
    [
      component,
      data.shift,
      data.pop,
      *data[0..-2]
    ]
  }
}

回报：

[
  ["component", "quantity type", "unit value", "#", "#", "#"],
  ["component", "quantity type", "unit value", "#", "#", "#"],
  ["component", "quantity type", "unit value", "#", "#", "#"]
]

剩下要做的就是创建您要使用的标头，并将返回的数据传回 CSV 以使其生成输出文件。您应该可以使用 CSV 文档从这里到达那里。

编辑：

根据实际数据，这是一个稍作调整的代码版本及其输出：

require 'csv'
require 'pp'

data = File.readlines('test.csv').slice_before(/^[^,]+$/)

pp data.flat_map { |ary|
  component = ary[0].strip
  ary[2..-1].map{ |a|
    record = CSV.parse(a).flatten
    [
      component,
      record.shift,
      record.pop,
      *record[0..-2]
    ]
  }
}

看起来像：

[["DCB",
  "Avg Temperature",
  "Celsius",
  "82.915",
  "69.226",
  "78.35",
  "78.383",
  "86.6",
  "85.763"],
["RCB",
  "Avg Temperature",
  "Celsius",
  "76.557",
  "68.779",
  "74.705",
  "74.739",
  "80.22",
  "79.397"],
["Antenna",
  "Avg Temperature",
  "Celsius",
  "69.988",
  "65.045",
  "69.203",
  "69.238",
  "73.567",
  "72.777"],
["PCBA_fiberTray",
  "Avg Temperature",
  "Celsius",
  "66.651",
  "65.904",
  "66.513",
  "66.551",
  "72.516",
  "70.47"]]

score 1 · Accepted Answer

我正在使用的代码创建包含所有操作的 csv 文件......感谢那些提供一些帮助的人。

require 'csv'

file_in = File.new('sp1.csv')
file_out = CSV.open('output.csv', 'w')

header = []
row = []


file_in.each_line do |line|

  case line
  when /^[^,]+$/ #Find a component (line with no comma)
    comp_header = file_in.gets.split(',') #header is after component and is split into an arry

    if header.empty? #header
      header.push("Component", comp_header[0], comp_header[-1].strip)
      comp_header[1..-3].each do |h|
        header.push(h)
      end
      file_out << header 

    end
    @comp = line.to_s.strip
    next
  when /,/ #when a row had commas
    puts @comp
    vals = line.split(',') #split up into vals array
    row.push(@comp, vals[0], vals[-1].strip) #add quantity and unit to row array
    vals[1..-3].each do |v| #for values (excluding quanity, units, reference info)
      row.push(v) #add values to row array
    end

  end
    file_out << row #write the current row to csv file
    row = [] #reset the row array to move on to the next component set

end

ruby - 通过操作行和列在 ruby​​ 中重新格式化 csv

3 回答 3

Related

Reference

ruby - 通过操作行和列在 ruby 中重新格式化 csv