我知道关于这个错误有多个类似的问题,我已经尝试了很多,但没有运气。我遇到的问题涉及字节\xA1
并且正在抛出
ArgumentError:UTF-8 中的无效字节序列
我尝试了以下但没有成功:
"\xA1".encode('UTF-8', :undef => :replace, :invalid => :replace,
:replace => "").sub('', '')
"\xA1".encode('UTF-8', :undef => :replace, :invalid => :replace,
:replace => "").force_encoding('UTF-8').sub('', '')
"\xA1".encode('UTF-8', :undef => :replace, :invalid => :replace,
:replace => "").encode('UTF-8').sub('', '')
每一行都会为我抛出错误。我究竟做错了什么?
更新:
上述行仅在 IRB 中失败。但是,我修改了我的应用程序以使用相同的 String#encode 方法和参数对 CVS 文件的行进行编码,并且在从文件中读取该行时出现相同的错误(注意:如果您对相同的字符串执行操作,它会起作用不使用 IO)。
bad_line = "col1\tcol2\tbad\xa1"
bad_line.sub('', '') # does NOT fail
puts bad_line # => col1 col2 bad?
tmp = Tempfile.new 'foo' # write the line to a file to emulate real problem
tmp.puts bad_line
tmp.close
tmp2 = Tempfile.new 'bar'
begin
IO.foreach tmp.path do |line|
line.encode!('UTF-8', :undef => :replace, :invalid => :replace, :replace => "")
line.sub('', '') # fail: invalid byte sequence in UTF-8
tmp2.puts line
end
tmp2.close
# this would fail if the above error didn't halt execution
CSV.foreach(tmp2.path) do |row|
puts row.inspect # fail: invalid byte sequence in UTF-8
end
ensure
tmp.unlink
tmp2.close
tmp2.unlink
end