这是我到目前为止所拥有的......问题是它正在生成一个看起来像的 JSON 文件(见下文)。我的问题是,当我检查页面上的代码时,我看不到 css 选择器的任何独特之处。他们都只是 tr td a。任何提示将不胜感激。
谢谢!
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'uri'
require 'json'
sammiches = Nokogiri::HTML(open("http://en.wikipedia.org/wiki/List_of_sandwiches"))
class Scraper
def initialize
@url = "http://en.wikipedia.org/wiki/List_of_sandwiches"
@nodes = Nokogiri::HTML(open(@url))
end
def summary(filename)
sammich_data = @nodes
sammiches = sammich_data.css('div.mw-content-ltr table.wikitable tr')
sammich_hashes = sammiches.map {|x|
name = x.css('td a').text
image = x.css('td a.image').text
country = x.css('td a').text
description = x.css('td a').text
{
:name => name,
:image => image,
:country => country,
:description => description,
}
}
File.open("public/#{filename}","w") do |f|
f.write(JSON.pretty_generate(sammich_hashes))
end
end
sammy = Scraper.new
puts sammy.summary('listy')
end
Json文件输出部分
[
{
"name": "",
"image": "",
"country": "",
"description": ""
},
{
"name": "BaconUnited Kingdomketchupbrown sauce",
"image": "",
"country": "BaconUnited Kingdomketchupbrown sauce",
"description": "BaconUnited Kingdomketchupbrown sauce"
},
{
"name": "Bacon, egg and cheesebreakfast sandwich",
"image": "",
"country": "Bacon, egg and cheesebreakfast sandwich",
"description": "Bacon, egg and cheesebreakfast sandwich"