ruby - 在 ruby 中将 utf-8 转换为 unicode

Question

“龅”的UTF-8是E9BE85，unicode是U+9F85。以下代码未按预期工作：

irb(main):004:0> "龅"
=> "\351\276\205"
irb(main):005:0> Iconv.iconv("unicode","utf-8","龅").to_s
=> "\377\376\205\237"

PS：我使用的是Ruby1.8.7。

score 4 · Accepted Answer

Ruby 1.9+ 比 1.8.7 更好地处理 Unicode，因此，我强烈建议尽可能在 1.9.2 下运行。

部分问题是 1.8 不理解 UTF-8 或 Unicode 字符可能超过一个字节长。1.9 确实理解了这一点，并引入了诸如 String#each_char 之类的东西。

require 'iconv'

# encoding: UTF-8

RUBY_VERSION # => "1.9.2"
"龅".encoding # => #<Encoding:UTF-8>
"龅".each_char.entries # => ["龅"]
Iconv.iconv("unicode","utf-8","龅").to_s # => 

# ~> -:8:in `iconv': invalid encoding ("unicode", "utf-8") (Iconv::InvalidEncoding)
# ~>    from -:8:in `<main>'

要使用 Iconv 获取可用编码列表，请执行以下操作：

require 'iconv'
puts Iconv.list

这是一个很长的列表，所以我不会在这里添加它。

score 4 · Accepted Answer

4

你可以试试这个：

"%04x" % "龅".unpack("U*")[0]

=> "9f85"

于 2011-06-01T17:17:06.770 回答

score 3 · Accepted Answer

应该UNICODEBIG//用作目标编码

irb(main):014:0> Iconv.iconv("UNICODEBIG//","utf-8","龅&quot;)[0].each_byte {|b| puts b.to_s(16)}
9f
85
=> "\237\205"

ruby - 在 ruby​​ 中将 utf-8 转换为 unicode

3 回答 3

Related

Reference

ruby - 在 ruby 中将 utf-8 转换为 unicode