ruby - 使用 hpricot 获取 href 属性的一部分

Question

我想我需要 hpricot 和正则表达式的组合。我需要搜索具有以“abc/”开头的“href”属性的“a”标签，并返回其后的文本，直到下一个正斜杠“/”。

所以，给定：

<a href="/abc/12345/xyz123/">One</a>
<a href="/abc/67890/xyzabc/">Two</a>

我需要回来：“12345”和“67890”

任何人都可以伸出援助之手吗？我一直在为此苦苦挣扎。

score 0 · Accepted Answer

用分割字符串怎么样/？

（我不知道 Hpricot，但根据文档）：

doc.search("a[@href]").each do |a|
    return a.somemethodtogettheattribute("href").split("/")[2]; // 2, because the string starts with '/'
end

score 0 · Accepted Answer

或使用正则表达式：

s = '<a href="/abc/12345/xyz123/">One</a>'
s =~ /abc\/([^\/]*)/
return $1

score 0 · Accepted Answer

你不需要正则表达式，但你可以使用它。这是两个示例，一个使用正则表达式，另一个没有，使用 Nokogiri，它应该与 Hpricot 兼容以供您使用，并使用 CSS 访问器：

require 'nokogiri'

html = %q[
  <a href="/abc/12345/xyz123/">One</a>
  <a href="/abc/67890/xyzabc/">Two</a>
]

doc = Nokogiri::HTML(html)
doc.css('a[@href]').map{ |h| h['href'][/(\d+)/, 1] } # => ["12345", "67890"]
doc.css('a[@href]').map{ |h| h['href'].split('/')[2] } # => ["12345", "67890"]

ruby - 使用 hpricot 获取 href 属性的一部分

3 回答 3

Related

Reference