0

我有以下代码,它是 html 的一部分:

<td><a href="http://youtube.com">YouTube</a></td>
<td><a data-category="news" href=http://kathack.com/party/aems/dic/list">Reddit</a></td>
<td><a href="http://kathack.com/party/aems">Kathack</a></td>
<td><a data-category="news" href="http://www.nytimes.com">New York Times</a></td>

现在我将如何搜索/aems/dic/list并获取完整的 url 存储?

4

2 回答 2

1

假设您有一个 Mechanize::Page 对象page

page.at('a[href*="/aems/dic/list"]')[:href]
#=> "http://kathack.com/party/aems/dic/list"

更新

对于更长的示例:

require 'mechanize'
agent = Mechanize.new
page = agent.get 'http://www.example.com/'
page.at('a[href*="/aems/dic/list"]')[:href]
#=> "http://kathack.com/party/aems/dic/list"
于 2013-01-21T02:14:02.600 回答
1

所以,有了nokogiri,像这样:

fragment = Nokogiri::HTML::DocumentFragment.parse text
fragment.css("a").each do |link|
  href = link['href']
  return href if href =~ /\/aems\/dic\/list/
end
于 2013-01-20T22:14:43.697 回答