34

Ubuntu 12.04 LTS

Ruby ruby​​ 1.9.3dev (2011-09-23 修订版 33323) [i686-linux]

导轨 3.2.9

以下是我收到的 CSV 文件的内容:

"date/time","settlement id","type","order id","sku","description","quantity","marketplace","fulfillment","order city","order state","order postal","product sales","shipping credits","gift wrap credits","promotional rebates","sales tax collected","selling fees","fba fees","other transaction fees","other","total"
"Mar 1, 2013 12:03:54 AM PST","5481545091","Order","108-0938567-7009852","ALS2GL36LED","Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor","1","amazon.com","Amazon","Pasadena","CA","91104-1056","43.00","3.25","0","-3.25","0","-6.45","-3.75","0","0","32.80"

但是,当我尝试解析 CSV 文件时出现错误:

1.9.3dev :016 > options = { col_sep: ",", quote_char:'"' }
=> {:col_sep=>",", :quote_char=>"\""} 

1.9.3dev :022 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
CSV::MalformedCSVError: Illegal quoting in line 1.
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
    from (irb):22
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'

然后我尝试简化数据,即

"name","age","email"
"jignesh","30","jignesh@example.com"

但是我仍然遇到同样的错误:

      1.9.3dev :023 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
  CSV::MalformedCSVError: Illegal quoting in line 1.
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
      from (irb):23
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'

我再次尝试像这样简化数据:

name,age,email
jignesh,30,jignesh@example.com

它有效。请参阅下面的输出:

  1.9.3dev :024 > CSV.foreach("/tmp/my_data.csv") { |row| puts row }
  name
  age
  email
  jignesh
  30
  jignesh@example.com
   => nil 

但是我将收到带有引用数据的 CSV 文件,因此我实际上并不是在寻找删除引号解决方案。我无法弄清楚导致错误的原因:CSV::MalformedCSVError: Illegal quoting in line 1。在我之前的示例中.

通过在我的文本编辑器中启用“显示空白字符”和“显示行尾”,我已经验证了 CSV 中没有前导/尾随空格。我还使用以下方法验证了编码。

  1.9.3dev :026 > File.open("/tmp/my_data.csv").read.encoding
  => #<Encoding:UTF-8> 

注意:我也尝试使用 CSV.read,但该方法出现同样的错误。

任何人都可以帮助我摆脱问题并让我了解问题出在哪里吗?

======================

我刚刚在以下位置找到了以下帖子:http ://www.ruby-forum.com/topic/448070并尝试了以下操作:

  file_data = file.read
  file_data.gsub!('"', "'")
  arr_of_arrs = CSV.parse(file_data)

  arr_of_arrs.each do |arr|
    Rails.logger.debug "=======#{arr}"
  end

并得到以下输出:

   =======["\xEF\xBB\xBF'date/time'", "'settlement id'", "'type'", "'order id'", "'sku'", "'description'", "'quantity'", "'marketplace'", "'fulfillment'", "'order city'", "'order state'", "'order postal'", "'product sales'", "'shipping credits'", "'gift wrap credits'", "'promotional rebates'", "'sales tax collected'", "'selling fees'", "'fba fees'", "'other transaction fees'", "'other'", "'total'"]
    =======["'Mar 1", " 2013 12:03:54 AM PST'", "'5481545091'", "'Order'", "'108-0938567-7009852'", "'ALS2GL36LED'", "'Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor'", "'1'", "'amazon.com'", "'Amazon'", "'Pasadena'", "'CA'", "'91104-1056'", "'43.00'", "'3.25'", "'0'", "'-3.25'", "'0'", "'-6.45'", "'-3.75'", "'0'", "'0'", "'32.80'"]

由于使用的默认col_sep是逗号字符,因此无法正确读取数据。但是我尝试使用这样的quote_char选项:

  arr_of_arrs = CSV.parse(file_data, :quote_char => "'")

但最终出现以下错误:

   CSV::MalformedCSVError (Illegal quoting in line 1.):

谢谢,吉涅什

4

10 回答 10

32
quote_chars = %w(" | ~ ^ & *)
begin
  @report = CSV.read(csv_file, headers: :first_row, quote_char: quote_chars.shift)
rescue CSV::MalformedCSVError
  quote_chars.empty? ? raise : retry 
end

它并不完美,但它在大多数情况下都有效。

NBCSV.parse采用与 相同的参数CSV.read,因此可以使用文件或内存中的数据

于 2013-09-27T04:09:35.950 回答
24

Anand,感谢您的编码建议。这为我解决了非法引用问题。

注意:如果您希望迭代器跳过标题行添加headers: :first_row,如下所示:

CSV.foreach("test.csv", encoding: "bom|utf-8", headers: :first_row)
于 2015-04-02T20:37:50.063 回答
14

我刚遇到这样的问题,发现 CSV 不喜欢 col-sep 和引号字符之间的空格。一旦我删除了这些,一切都很好。所以我有:

12,  "N",  12, "Pacific/Majuro"

但是一旦我使用

.gsub(/,\s+\"/,',\"')

导致

12,"N",  12,"Pacific/Majuro"

一切顺利。

于 2013-10-18T18:30:07.747 回答
6

Rails 6 版本,红宝石 2.4+

CSV.foreach(file, liberal_parsing: true, headers: :first_row) do |row|
    // do whatever
end

https://ruby-doc.org/stdlib-2.4.0/libdoc/csv/rdoc/CSV.html

于 2020-08-25T07:53:45.950 回答
5

这个线程传递选项:quote_char => "|"

CSV.read(filename, :quote_char => "|")

于 2019-12-16T09:37:49.673 回答
2

我对引发此错误的商标字符有疑问。

商标字符在 UTF-8 中转换为 \"!,因此引发错误的是开放式引号符号。所以我这样做了:

.gsub!("\"!", "")

然后我尝试创建我的 CSV 对象,它工作正常。

于 2016-02-25T12:11:25.880 回答
2

添加:liberal_parsing => true参数CSV.read,这应该可以解决“非法引用”的一些问题

于 2020-11-03T13:40:04.883 回答
0

我试图读取文件并获取一个字符串,然后将这些字符串解析为一个 CSV 表,但收到了一个异常:

CSV.read(File.read('file.csv'), headers: true)
CSV::MalformedCSVError: Unclosed quoted field on line 1794.

这里提供的答案都不适合我。事实上,得票最高的那个花了很长时间来解析,最终我终止了执行。它很可能引发了许多异常,而对于大文件来说,时间是代价高昂的。

更成问题的是,该错误没有太大帮助,因为它是一个大型 CSV 文件。1794 行到底在哪里?我在 LibreOffice 中打开了该文件,该文件打开时没有任何问题。第 1794 行是 csv 文件的最后一行数据。所以显然问题与 CSV 文件的结尾有关。我决定使用 File.read 将内容作为字符串进行检查。我注意到字符串以回车符结尾:

,\"\"\r

我决定使用 chomp 并删除文件末尾的回车符。请注意,如果 $/ 没有从默认的 Ruby 记录分隔符更改,则 chomp 也会删除回车符(也就是说,它将删除 \n、\r 和 \r\n)。

CSV.parse(File.read('file.csv' ).chomp, headers: true)
 => #<CSV::Table mode:col_or_row row_count:1794>

它奏效了。问题是文件末尾的 \r 字符。

于 2018-12-23T00:14:55.693 回答
-2

A less common cause of this error is when the file doesn't do any field quoting, but quote_char is still set (by default it's ") and one or more fields happen to contain the character.

To disable field quoting entirely, set quote_char: nil in the parsing options.

For example, given a file /tmp/people.csv like this:

Actor,Dwayne "The Rock" Johnson,1972-05-02
Character,TV's Frank,1956-08-30

It could be parsed with this:

CSV.read('/tmp/people.csv', quote_char: nil)
于 2021-02-28T21:00:44.490 回答
-4

试试这个提示:

  1. 在文本编辑器中打开 CSV 文件
  2. 选择整个文件并复制它
  3. 打开一个新的文本文件
  4. 将 CSV 数据粘贴到新文件中并保存新文件
  5. 导入新的 CSV 文件
于 2013-09-10T08:16:52.677 回答