ruby - 解析和替换多个链接，但不是当一个包含另一个时

Question

我不知道如何（轻松）避免链接（2）来替换链接（1）的开头。我很感激 Ruby 中的答案，但如果你弄清楚逻辑它也很好。

输出应该是：

 message = "For Last Minute rentals, please go to:
    <span class='external_link' href-web='http://www.mydomain.com/thepage'>http://www.mydomain.com/thepage</span> (1)

    For more information about our events, please visit our website: 
    <span class='external_link' href-web='http://www.mydomain.com'>http://www.mydomain.com</span> (2)"

但它是：

    message = "For Last Minute rentals, please go to:
    <span class='external_link' href-web='<span class='external_link' href-web='http://www.mydomain.com'>http://www.mydomain.com</span>/thepage'><span class='external_link' href-web='http://www.mydomain.com'>http://www.mydomain.com</span>/thepage</span> (1)

    For more information about our events, please visit our website: 
    <span class='external_link' href-web='http://www.mydomain.com'>http://www.mydomain.com</span> (2)"

这是代码（已编辑：取出跨度）：

     message = "For Last Minute rentals, please go to:
    http://www.mydomain.com/thepage

    For more information about our events, please visit our website: 
    http://www.mydomain.com"

   links_found = URI.extract(message, ['http', 'https'])

   for link_found in links_found          
     message.gsub!(link_found,"<span class='external_link' href-web='#{web_link}'>#{link_found}</span>")
   end

想法？

score 0 · Accepted Answer

首先，规则一，在处理 HTML 或 XML 时，除了最琐碎的事情之外，不要为任何事情操心字符串操作或正则表达式。否则肯定会发疯。

相反，请保持理智并使用真正的解析器。对于 Ruby，我强烈建议您仅查看 Nokogiri - 它确实有效。

考虑这段代码：

require 'nokogiri'

message = "For Last Minute rentals, please go to:
<span class='external_link' href-web='http://www.mydomain.com/thepage'>http://www.mydomain.com/thepage</span> (1)

For more information about our events, please visit our website: 
<span class='external_link' href-web='http://www.mydomain.com'>http://www.mydomain.com</span> (2)"

doc = Nokogiri::HTML(message)

external_spans = doc.search('span.external_link')

url1 = external_spans[0]['href-web'] # => "http://www.mydomain.com/thepage"
text1 = external_spans[0].text       # => "http://www.mydomain.com/thepage"
url2 = external_spans[1]['href-web'] # => "http://www.mydomain.com"
text2 = external_spans[1].text       # => "http://www.mydomain.com"

url和分别是来自和来自text1的 URL 。span 1url2text2span 2

我不确定你想用它们做什么，因为经过粗略的一瞥，我看不出你的源代码和期望的输出有什么不同，但是，一旦你有了它们，你就可以自由地做任何事情。解析器，如 Nokogiri，允许您从 HTML 或 XML DOM 中检索信息、替换它、移动内容，甚至拼接新内容。

score 0 · Accepted Answer

我猜你的问题与URI.extract. 当它通过时，它会message拉取“http”的所有<span>实例，对于第一行，它在.

为了进一步澄清，links_found将是一个同时包含<span...href-web:...和的数组http...</span>。由于您只是将link_foundgsub 作为要匹配的模式传递给它，因此它将替换links_found[]数组中的每个对象

ruby - 解析和替换多个链接，但不是当一个包含另一个时

2 回答 2

Related

Reference