嗨,我正在尝试抓取网页“获取链接”转到该链接并“抓取它”。
require 'rubygems'
require 'scrapi'
require 'uri'
Scraper::Base.parser :html_parser
web = "http://......"
def sub_web(linksubweb)
uri = URI.parse(URI.encode(linksubweb))
end
scraper = Scraper.define do
array :items
process "div.mozaique>div", :items => Scraper.define {
process "p>a", :title => :text
process "div.thumb>a", :link => "@href"
result :title, :link,
}
result :items
end
uri = URI.parse(URI.encode(web))
scraper.scrape(uri).each do |pag|
link_full = uri + pag.link.to_str
puts pag.title
sub_web(link_full)
puts
end
我有以下错误
e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift) /Users/sss/web/app/views/admin/topics/webconector.rb
Title 1
http://mydomain/user34/top5
/Users/sss/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:304:in `escape': undefined method `gsub' for #<URI::HTTP:0x007fa07cb01e08> (NoMethodError)
from /Users/sss/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:623:in `escape'
from ../app/views/admin/topics/conectaweb.rb:11:in `sub_web'
from ../app/views/admin/topics/conectaweb.rb:34:in `block in <top (required)>'
from ../views/admin/topics/conectaweb.rb:29:in `each'
from ../app/views/admin/topics/conectaweb.rb:29:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
Process finished with exit code 1