ruby - 使用 Ruby 对字符串中的字符进行转义

Question

给定以下格式的字符串（Posterous API 以这种格式返回帖子）：

s="\\u003Cp\\u003E"

如何将其转换为实际的 ascii 字符s="<p>"？

在 OSX 上，我成功使用Iconv.iconv('ascii', 'java', s)，但一旦部署到 Heroku，我就会收到Iconv::IllegalSequence异常。我猜 Heroku 部署的系统不支持java编码器。

我正在使用HTTParty向 Posterous API 发出请求。如果我使用 curl 发出相同的请求，那么我不会得到双斜杠。

来自 HTTParty github 页面：

基于响应内容类型将 JSON 和 XML 自动解析为 ruby 哈希

Posterous API 返回 JSON（无双斜杠），HTTParty 的 JSON 解析正在插入双斜杠。

这是我使用 HTTParty 发出请求的方式的一个简单示例。

class Posterous
  include HTTParty
  base_uri "http://www.posterous.com/api/2"
  basic_auth "username", "password"
  format :json
  def get_posts
    response = Posterous.get("/users/me/sites/9876/posts&api_token=1234")
    # snip, see below...
  end
end

将明显的信息（用户名、密码、site_id、api_token）替换为有效值。

在 snip 处，response.body包含一个 JSON 格式的 Ruby 字符串，并response.parsed_response包含一个 Ruby 哈希对象，HTTParty 通过解析来自 Posterous API 的 JSON 响应创建该对象。

在这两种情况下，unicode 序列如\u003C已更改为\\u003C.

score 3 · Accepted Answer

我找到了解决这个问题的方法。我遇到了这个要点。elskwid 遇到了同样的问题，并通过 JSON 解析器运行了字符串：

s = ::JSON.parse("\\u003Cp\\u003E")

现在，s = "<p>".

score 1 · Accepted Answer

前几天我遇到了这个确切的问题。HTTParty 使用的 json 解析器中存在一个错误（Crack gem） - 基本上它对 Unicode 序列使用区分大小写的正则表达式，所以因为 Posterous 输出的是 AF 而不是 af，Crack 并没有对它们进行转义。我提交了一个拉取请求来解决这个问题。

与此同时，HTTParty 可以很好地让你指定备用解析器，这样你就可以::JSON.parse像这样完全绕过 Crack：

class JsonParser < HTTParty::Parser
  def json
    ::JSON.parse(body)
  end
end

class Posterous
   include HTTParty
   parser ::JsonParser

   #....
end

score 1 · Accepted Answer

您还可以使用pack：

"a\\u00e4\\u3042".gsub(/\\u(....)/){[$1.hex].pack("U")} # "aäあ"

或者反过来：

"aäあ".gsub(/[^ -~\n]/){"\\u%04x"%$&.ord} # "a\\u00e4\\u3042"

score 0 · Accepted Answer

The doubled-backslashes almost look like a regular string being viewed in a debugger.

The string "\u003Cp\u003E" really is "<p>", only the \u003C is unicode for < and \003E is >.

>> "\u003Cp\u003E"  #=> "<p>"

If you are truly getting the string with doubled backslashes then you could try stripping one of the pair.

As a test, see how long the string is:

>> "\\u003Cp\\u003E".size #=> 13
>> "\u003Cp\u003E".size #=> 3
>> "<p>".size #=> 3

All the above was done using Ruby 1.9.2, which is Unicode aware. v1.8.7 wasn't. Here's what I get using 1.8.7's IRB for comparison:

>> "\u003Cp\u003E" #=> "u003Cpu003E"

ruby - 使用 Ruby 对字符串中的字符进行转义

4 回答 4

Related

Reference