我收到一个制表符分隔的文件,并以默认字符集“Unicode”打开。据我了解,“Unicode”可能指的是 UTF-16。
当我尝试使用此命令打开此文件时:
CSV.foreach(file, :col_sep => "\t", :headers => true) do |column|
puts column[0]
end
我收到以下错误:
invalid byte sequence in UTF-8
我知道如果我打开这个文件并将其保存为“UTF-8”它会正常工作,但我不能手动打开文件并每次都这样做。我怎样才能绕过这个错误?
编辑:
传入时:encoding: 'UTF-16BE'
根据下面的 stefans 请求,我收到:
invalid byte sequence in UTF-16BE
也许我传递了错误的编码选项?
编辑2:
传入时:encoding => 'ISO-8859-1'
,我收到此错误:
Illegal quoting in line 1. (CSV::MalformedCSVError)
我的文件中的第 1 行如下:
"Status" "Internal ID" "Language" "Created At" "Updated At" "IP Address" "Location" "Username" "GET Variables" "Referrer" "Number of Saves" "Weighted Score" "Completion Time" "Invite Code" "Invite Email" "Invite Name" "Invite: branchid" "Invite: lastname" "Invite: clientname" "Invite: membershipid" "Invite: clientid" "Invite: dateofbirth" "Invite: membershiptype" "Invite: branch" "Invite: unitid" "Invite: shortname" "Invite: changedatetime" "Invite: homephone" "Collector"
我尝试输入 aquote_char
但我得到了同样的错误。我的代码现在看起来像这样:
CSV.foreach(file, :col_sep => "\t", :encoding => 'ISO-8859-1', :quote_char => '"', :headers => true) do |column|
puts column[0]
end