ruby - 在 Ruby 中抓取/解析 Google 搜索结果

Question

假设我拥有 Google 搜索结果页面的整个 HTML。有谁知道任何现有的代码（Ruby？）来抓取/解析谷歌搜索结果的第一页？理想情况下，它将处理可以在任何地方出现的购物结果和视频结果部分。

如果不是，一般来说，最好的基于 Ruby 的屏幕抓取工具是什么？

澄清一下：我知道以编程方式/API 方式获取 Google 搜索结果是困难的/不可能的，并且简单地卷曲结果页面有很多问题。在stackoverflow上对这两点都有共识。我的问题不同。

score 9 · Accepted Answer

这应该是很简单的事情，看看Ryan Bates的“ Screen Scraping with ScrAPI ”截图。你仍然可以不用抓取库，只需坚持Nokogiri 之类的东西。

来自 Nokogiri 的文档：

require 'nokogiri'
require 'open-uri'

# Get a Nokogiri::HTML:Document for the page we’re interested in...

doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))

# Do funky things with it using Nokogiri::XML::Node methods...

####
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
  puts link.content
end

####
# Search for nodes by xpath
doc.xpath('//h3/a[@class="l"]').each do |link|
  puts link.content
end

####
# Or mix and match.
doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
  puts link.content
end

score 3 · Accepted Answer

我不清楚您为什么首先要进行屏幕抓取。也许 REST 搜索 API 会更合适？它将以 JSON 格式返回结果，这将更容易解析并节省带宽。

例如，如果您的搜索是“foo bar”，您可以发送一个 GET 请求http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=foo+bar并处理响应。

有关详细信息，请参阅“ Google 搜索 REST API ”或 Google 的开发者页面。

score 0 · Accepted Answer

0

我建议 HTTParty + Google 的 Ajax 搜索 API。

于 2010-05-08T10:17:35.960 回答

score -1 · Accepted Answer

我不知道 Ruby 特定的代码，但这个google scraper可以帮助你。这是一个在线工具演示，用于抓取和解析 Google 结果。最有趣的是那里的文章解释了 PHP 的解析过程，但它适用于 Ruby 和任何其他编程语言。

score -1 · Accepted Answer

您应该能够使用Mechanize轻松实现您的目标。

如果您已经有了结果，那么您只需要Hpricot或Nokogiri。

score -1 · Accepted Answer

随着 Google 在扩展结果结构（丰富片段、知识图、直接答案等）的同时不断变化，报废变得越来越难，我们构建了一个服务来处理这种复杂性的一部分，我们确实有一个Ruby 库。使用起来非常简单：

query = GoogleSearchResults.new q: "coffee"

# Parsed Google results into a Ruby hash
hash_results = query.get_hash

ruby - 在 Ruby 中抓取/解析 Google 搜索结果

6 回答 6

Related

Reference