0

我正在打开一个 CSV 文件并使用 File.open(filename) 从中读取值。

所以我做这样的事情:

my_file = File.open(filename)
my_file.each_line do |line|
 line_array = line.split("\t")
 ratio = line_array[1]
 puts "#{ratio}"
 puts ratio.isutf8?
end

我遇到的问题是 line_array 中的值似乎是一种奇怪的格式。例如,CSV 文件单元格中的值之一是 0.86。当我打印出来时,它看起来像“0 . 8 6”

所以它有点像字符串,但我不确定它是如何编码的。当我尝试进行一些自省时:

ratio.isutf8?
I get this:
=> undefined method 'isutf8?' for "\0000\000.\0008\0006\000":String

到底他妈发生了什么?!如何将 ratio 转换为可以调用 ratio.to_f 的普通字符串?

谢谢。

4

2 回答 2

3

Unpacking a binary string is generally called decoding. It looks like your data is in UTF-16 but should should find you what encoding it is actually using (e.g. by investigating the workflow/configuration that produced it) before assuming this is true.

In Ruby 1.9 (decode on the fly):

my_file = File.open(filename).set_encoding('UTF-16BE:UTF-8')
# the rest as in the original

In Ruby 1.8 (read in whole file, then decode and parse it; may not work for super large files):

require 'iconv'

# …

my_file = File.open(filename)
my_text = Iconv.conv('UTF-8', 'UTF-16BE', my_file.read)
my_text.each_line do |line|
 # the rest as in the original
end
于 2010-06-22T19:18:17.613 回答
2

看起来您的输入数据被编码为UTF-16 或 UCS-2

尝试这样的事情:

require 'iconv'

ratio = Iconv.conv('UTF-8', 'UTF-16', line_array[1])
puts "Ratio is now '#{ratio}'."

想想看,你应该在调用 split 之前在整行上运行 Iconv.conv ,否则字符串末尾会有零字节(除非你将分隔符更改为 '\000\t',这看起来相当丑陋。)

于 2010-06-22T17:55:11.147 回答