ruby - 如何在 Ruby 中将字符串转换为 UTF8

Question

我正在编写一个使用 Hpricot 的爬虫。它从某个网页下载字符串列表，然后我尝试将其写入文件。编码有问题：

"\xC3" from ASCII-8BIT to UTF-8

我有在网页上呈现并以这种方式打印的项目：

DÃ©veloppement

str.encoding回报UTF-8，所以没有force_encoding('UTF-8')帮助。如何将其转换为可读的 UTF-8？

score 69 · Accepted Answer

您的字符串似乎以错误的方式编码：

"DÃ©veloppement".encode("iso-8859-1").force_encoding("utf-8")
#=> "Développement"

score 58 · Accepted Answer

似乎您的字符串认为它是 UTF-8，但实际上，它是别的东西，可能是 ISO-8859-1。

首先定义（强制）正确的编码，然后将其转换为 UTF-8。

在您的示例中：

puts "DÃ©veloppement".encode('iso-8859-1').encode('utf-8')

另一种选择是：

puts "\xC3".force_encoding('iso-8859-1').encode('utf-8') #-> Ã

如果Ã没有意义，请尝试另一种编码。

score 5 · Accepted Answer

“ ruby 1.9: invalid byte sequence in UTF-8 ” 描述了另一种用更少代码的好方法：

file_contents.encode!('UTF-16', 'UTF-8')

3 回答 3