首先,它不是由 JavaScript 呈现的。其次,它可能不会返回任何内容,因为 Google 会阻止没有类似浏览器的请求user-agent
。什么是我的user-agent
?第三,如果您只想检索一个(第一个)结果,您可以使用css
/xpath
和nokogiri
at_css
/at_css
快捷方式,例如:
doc.css(".yuRUbf a h3/text()") #=> Harry Potter: Toys & Games - Amazon.co.uk ...
代码:
require 'nokogiri'
require 'httparty'
headers = {
"User-Agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
q: "Amazon.co.uk: Toys - Harry Potter: Toys & Games",
hl: "en"
}
response = HTTParty.get('https://www.google.com/search',
query: params,
headers: headers)
doc = Nokogiri::HTML(response.body)
# extract all organic resutlts
puts doc.css(".yuRUbf a h3/text()"),
doc.css(".yuRUbf a/@href")
---
=begin
harry potter: Toys Store - Amazon.co.uk
harry potter toys - Amazon.com
harry potter: Toys & Games - Amazon.com
harry potter toys: Toys & Games - Amazon.com
Toys & Games - Amazon.com
Harry Potter: Toys & Games - Amazon.com
1-48 of 405 results for "harry potter lego" - Amazon
harry potter lego sets - Amazon.com
https://www.amazon.co.uk/Toys-Games-Harry-Potter/s?rh=n%3A468292%2Cp_89%3AHarry+Potter
https://www.amazon.co.uk/harry-potter-toys/s?k=harry+potter+toys
https://www.amazon.co.uk/harry-potter-Toys-Store/s?k=harry+potter&rh=n%3A468292
https://www.amazon.com/harry-potter-toys/s?k=harry+potter+toys
https://www.amazon.com/harry-potter-Toys-Games/s?k=harry+potter&rh=n%3A165793011
https://www.amazon.com/harry-potter-toys-Games/s?k=harry+potter+toys&rh=n%3A165793011
https://www.amazon.com/toys/b?ie=UTF8&node=165793011
https://www.amazon.com/Toys-Games-Harry-Potter/s?rh=n%3A165793011%2Cp_lbr_characters_browse-bin%3AHarry+Potter
https://www.amazon.com/harry-potter-lego/s?k=harry+potter+lego
https://www.amazon.com/harry-potter-lego-sets/s?k=harry+potter+lego+sets
=end
或者,您可以使用来自 SerpApi的Google Organic Results API来实现此目的。这是一个带有免费计划的付费 API。主要区别之一是您只需要遍历结构化的json
.
要集成的代码:
require 'google_search_results'
params = {
api_key: ENV["API_KEY"],
engine: "google",
q: "Amazon.co.uk: Toys - Harry Potter: Toys & Games",
hl: "en"
}
search = GoogleSearch.new(params)
hash_results = search.get_hash
# [0] first element from organic results
puts hash_results[:organic_results][0][:title],
hash_results[:organic_results][0][:link]
#=> Harry Potter: Toys & Games - Amazon.co.uk
#=> https://www.amazon.co.uk/Toys-Games-Harry-Potter/s?rh=n%3A468292%2Cp_89%3AHarry+Potter
免责声明,我为 SerpApi 工作。