6

我有一个“.CSV”文件,我试图CSV在 ruby​​ 中解析它。该文件有两行标题,但我以前从未遇到过,也不知道如何处理它。下面是标题和行的示例。

第 1 行

"Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name","Rushing","","","","","Passing","","","","","","Total Off.","","Receiving","","","Pass Int","","","Fumble Ret","","","Punting","","Punt Ret","","","KO Ret","","","Total TD","Off xpts","","","","Def xpts","","","","FG","","Saf","Points"

第 2 行

"","","","","","","Rushes","Gain","Loss","Net","TD","Att","Cmp","Int","Yards","TD","Conv","Plays","Yards","No.","Yards","TD","No.","Yards","TD","No.","Yards","TD","No.","Yards","No.","Yards","TD","No.","Yards","TD","","Kicks Att","Kicks Made","R/P Att","R/P Made","Kicks Att","Kicks Made","Int/Fum Att","Int/Fum Made","Att","Made"

第 3 行

"721","AirForce","09/01/12","19","BASKA","DAVID","","","","","","","","","","","","0","0","","","","","","","","","","2","85","","","","","","","","","","","","","","","","","","","0"

上面的示例中没有返回,我只是添加了它们,这样更容易阅读。是否CSV有可用的方法来处理这种结构,或者我是否必须编写自己的方法来处理它?谢谢!

4

5 回答 5

9

看起来您的 CSV 文件是从 Excel 电子表格生成的,该电子表格的列分组如下:

... |        Rushing        |         Passing         | ...
... |Rushes|Gain|Loss|Net|TD|Att|Cmp|Int|Yards|TD|Conv| ...

(不确定我是否正确恢复了组。)

没有标准工具可以处理这种 CSV 文件,AFAIK。您必须手动完成这项工作。

  • 阅读第一行,将其视为第一个标题行。
  • 阅读第二行,将其视为第二个标题行。
  • 读取第三行,将其视为第一行数据。
  • ...
于 2013-06-06T00:57:06.440 回答
5

我建议使用smarter_csvgem,并手动提供正确的标题:

 require 'smarter_csv'
 options = {:user_provided_headers => ["Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name", ... provide all headers here ... ], 
            :headers_in_file => false}
 data = SmarterCSV.process(filename, options)
 data.pop # to ignore the first header line
 data.pop # to ignore the second header line
 # data now contains an array of hashes with your data

请查看 GitHub 页面以获取选项和示例。 https://github.com/tilo/smarter_csv

您应该使用的一个选项是:user_provided_headers,然后只需在数组中指定所需的标题。这样你就可以解决这样的情况。

您将不得不data.pop忽略文件中的标题行。

于 2013-06-06T01:30:25.037 回答
3

您必须编写自己的逻辑。CSV 实际上只是行和列,它本身并不知道每一列或每一行到底是什么,它只是原始数据。因此,CSV 没有概念或意识,即它有两个标题行,这是人为的事情,因此您需要构建自己的启发式方法。

鉴于您的数据行如下所示:

"721","Air Force","09/01/12",

当你开始解析你的数据时,如果第一列代表一个整数,那么,如果你将它转换为一个 int 并且如果它> 0比你知道你正在处理一个有效的“行”而不是一个标题。

于 2013-06-06T00:42:58.117 回答
1

读取 CSV文件并跳过输出的第一行:

arr_of_arrs = CSV.read("path/to/file.csv")
arr_of_arrs[2..arr_of_arrs.length].each do |x|
   # operation here
end
于 2013-06-06T00:53:10.887 回答
1

使用 CSV 很容易做到这一点。只需注意查看当前读取的行号是什么,然后循环直到您读取标题:

require 'csv'

CSV.foreach('test.csv') do |row|
  next unless $. > 2
  puts "'" + row.join("', '") + "'"
end

运行时输出如下:

'721', 'Air Force', '09/01/12', '19', 'BASKA', 'DAVID', '', '', '', '', '', '', '', '', '', '', '', '0', '0', '', '', '', '', '', '', '', '', '', '2', '85', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '0'

$.是从打开的文件中读取的最后一行的行号。所以,这会立即循环,直到$.读到两行。

于 2013-06-06T06:59:24.893 回答