We're cleaning up some errors on our site after migration from ruby 1.8.7 to 1.9.3, Rails 3.2.12. We have one encoding error left -- Bing is sending requests for URLs in the form
/search?q=author:\"Andr\xc3\xa1s%20Guttman\"
(This reads /search?q=author:"András Guttman"
, where the á
is escaped).
In fairness to Bing, we were the ones that gave them those bogus URLs, but ruby 1.9.3 isn't happy with them any more.
Our server is currently returning a 500. Rails is returning the error "Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT"
I am unable to reproduce this error in a browser, or via curl
or wget
from OS X or Linux command line.
I want to send a 301 redirect back with a properly encoded URL.
I am guessing that I want to:
- detect that the URL has old UTF-8 then if it is malformed, only
- use
String#encode
to get from old to new UTF-8 - use
CGI.escape()
to %-encode the URL - 301 redirect to the corrected URL
So I have read a lot and am not sure how (or if) I can detect this bogus URL. I need to detect because otherwise I would have to 301 everything!
When I try in irb I get these results:
1.9.3p392 :015 > foo = "/search?q=author:\"Andr\xc3\xa1s%20Guttman\""
=> "/search?q=author:\"András%20Guttman\""
1.9.3p392 :016 > "/search?q=author:\"Andr\xc3\xa1s%20Guttman\"".encoding
=> #<Encoding:UTF-8>
1.9.3p392 :017 > foo.encoding
=> #<Encoding:UTF-8>
I have read this SO post but I am not sure if I have to go this far or even if this applies.
[Update: since posting, we have added a call to the code in the SO post linked above prior to all requests.]
So the question is: how can I detect the old-style encoding so that I can do the other steps.