1

嗨,我正在尝试抓取网页“获取链接”转到该链接并“抓取它”。

require 'rubygems'
require 'scrapi'
require 'uri'

Scraper::Base.parser :html_parser

web = "http://......"

def sub_web(linksubweb)

  uri = URI.parse(URI.encode(linksubweb))

end

scraper = Scraper.define do

   array :items

   process "div.mozaique>div", :items  => Scraper.define {

       process "p>a", :title => :text
       process "div.thumb>a", :link => "@href"

       result :title, :link, 
     }
    result :items
end


  uri = URI.parse(URI.encode(web))

  scraper.scrape(uri).each do |pag|

    link_full = uri + pag.link.to_str
    puts pag.title
    sub_web(link_full)
    puts
  end

我有以下错误

e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift) /Users/sss/web/app/views/admin/topics/webconector.rb
Title 1
http://mydomain/user34/top5

/Users/sss/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:304:in `escape': undefined method `gsub' for #<URI::HTTP:0x007fa07cb01e08> (NoMethodError)
    from /Users/sss/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:623:in `escape'
    from ../app/views/admin/topics/conectaweb.rb:11:in `sub_web'
    from ../app/views/admin/topics/conectaweb.rb:34:in `block in <top (required)>'
    from ../views/admin/topics/conectaweb.rb:29:in `each'
    from ../app/views/admin/topics/conectaweb.rb:29:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

Process finished with exit code 1
4

1 回答 1

5

尝试使用uri = URI.parse(URI.encode(linksubweb.to_s))它应该可以。问题是该方法需要一个字符串参数,因此您必须首先将URI::HTTP对象转换为字符串。

于 2013-08-27T10:36:49.397 回答