正则表达式在 99.9% 的情况下都不是处理 HTML(或 XML)的错误工具。相反,请使用解析器,例如Nokogiri:
require 'nokogiri'
html = '<img src="https://filin.mail.ru/pic?width=90&height=90&email=multicc%40multicc.mail.ru&version=4&build=7" style="">'
doc = Nokogiri::HTML(html)
url = doc.at('img')['src'] # => "https://filin.mail.ru/pic?width=90&height=90&email=multicc%40multicc.mail.ru&version=4&build=7"
doc.at('img')['style'] # => ""
一旦你检索到你想要的数据,比如src
,使用另一个“正确”的工具,比如URI:
require 'uri'
scheme, userinfo, host, port, registry, path, opaque, query, fragment = URI.split(url)
scheme # => "https"
userinfo # => nil
host # => "filin.mail.ru"
port # => nil
registry # => nil
path # => "/pic"
opaque # => nil
query # => "width=90&height=90&email=multicc%40multicc.mail.ru&version=4&build=7"
fragment # => nil
query_parts = Hash[URI.decode_www_form(query)]
query_parts # => {"width"=>"90", "height"=>"90", "email"=>"multicc@multicc.mail.ru", "version"=>"4", "build"=>"7"}