4

假设,我想从 Web 获取一个页面到我的应用程序并对其进行某种解析。我怎么做?我应该从哪里开始?应该需要一些插件/宝石吗?您解决此类任务的通常做法是什么?

4

2 回答 2

7

您应该尝试像Hpricot ( wiki ) 或Nokogiri这样的 Gems 。

Hpricot 示例:

require 'open-uri'
require 'rubygems'
require 'hpricot'

html = Hpricot(open(an_url).read)
# This would search for any images inside a paragraph (XPath)
html.search('/html/body//p//img')
# This would search for any images with the class "test" (CSS selector)
html.search('img.test')

Nokogiri 示例:

require 'open-uri'
require 'rubygems'
require 'hpricot'

html = Nokogiri::HTML(open(an_url).read)
# This would search for any images inside a paragraph (XPath)
html.xpath('/html/body//p//img')
# This would search for any images with the class "test" (CSS selector)
html.css('img.test')

Nokogiri 通常更快。这两个库都具有很多功能。

于 2009-09-24T05:17:35.420 回答
0

你想做的事情叫做“

Ryan Bates 就该主题制作了两个出色的截屏视频:

我个人更喜欢Nokogiri。您还可以查看以下答案:Best Rails HTML Parser

于 2012-02-07T14:11:32.490 回答