I am using Ruby to scrape webpages that sometimes return redirects which I want to follow. There is many Ruby gems that do that, but there is a problem:
Ruby's URI.parse
explodes on some URIs that are technically invalid but work in browsers like "http://www.google.com/?q=<>"
URI.parse("http://www.google.com/?q=<>") #=> error
require 'addressable/uri'
Addressable::URI.parse("http://www.google.com/?q=<>") #=> works
All the HTTP client libraries I have tried (HttParty, Faraday, RestClient) break when they encounter such a URI in a redirect (this is on ruby 1.9.3)
rest-client:
require 'rest-client'
RestClient.get("http://bitly.com/ReeuYv") #=> explodes
faraday:
require 'faraday'
require 'faraday_middleware'
Faraday.use(FaradayMiddleware::FollowRedirects)
Faraday.get("http://bitly.com/ReeuYv") #=> explodes
httparty:
require 'httparty'
HTTParty.get("http://bitly.com/ReeuYv") # => explodes
open-uri:
require 'open-uri'
open("http://bitly.com/ReeuYv") # => explodes
What can I do to make this work?