1

I need to sanitize some text sent to an email service provider (Sendgrid) that does not support unicode in the recipient name unless it is \u escaped.

When the UTF-8 string s = "Pablö" how can I "\u escape" any unicode inside the string so I get "Pabl\u00f6" ?

Converting to JSON also escapes the quotes (which I don't want):

"Pablö".to_json
=> "\"Pabl\\u00f6\""

What I'm looking for is something just like .force_encoding('binary') except for Unicode. Inspecting Encoding.aliases.values.uniq I don't see anything like 'unicode'.

4

1 回答 1

0

I'm going to assume that everything is UTF-8 because we're not cavemen banging rocks together.

to_json isn't escaping quotes, it is adding quotes inside the string (because JSON requires strings to be quoted) and then inspect escapes them (and the backslash).

These quotes from to_json should always be there so you could just strip them off:

"Pablö".to_json[1..-2] # Lots of ways to do this...
=> "Pabl\\u00f6"

Keep in mind, however, that the behavior of to_json and UTF-8 depends on which JSON library you're using and possibly other things. For example, in my stock Ruby 2.2, the standard JSON library leaves UTF-8 alone; the JSON specification is quite happy with UTF-8 so why bother encoding it? So you might want to do it yourself with something like:

s.chars.map { |c| c.ord > 127 ? '\u%.4x' % c.ord : c }.join

Anything above 127 is out of ASCII range so that simple ord test takes care of anything like ö, ñ, µ, ... You'll want to adjust the map block if you need to encode other characters (such as \n).

于 2015-01-16T19:32:11.823 回答