1

我试图能够进行全局异常捕获,以便在发生错误时添加额外信息。我有两个班级,“爬虫”和“亚马逊”。我想要做的是能够调用“crawl”,在amazon中执行一个函数,并在crawl函数中使用异常处理。

这是我的两个课程:

require 'mechanize'

class Crawler
  Mechanize.html_parser = Nokogiri::HTML

  def initialize
    @agent = Mechanize.new
  end

  def crawl
    puts "crawling"

    begin
      #execute code in Amazon class here?
    rescue Exception => e
      puts "Exception: #{e.message}"
      puts "On url: #{@current_url}"
      puts e.backtrace
    end
  end

  def get(url)
    @current_url = url
    @agent.get(url)
  end
end

class Amazon < Crawler
  #some code with errors
  def stuff
    page = get("http://www.amazon.com")
    puts page.parser.xpath("//asldkfjasdlkj").first['href']
  end
end

a = Amazon.new
a.crawl

有没有办法可以在“crawl”中调用“stuff”,这样我就可以对整个 stuff 函数使用异常处理?有没有更好的方法来实现这一点?

编辑:这是我结束的时候

require 'mechanize'

class Crawler
  Mechanize.html_parser = Nokogiri::HTML

  def initialize
    @agent = Mechanize.new
  end

  def crawl
    yield
  rescue Exception => e
    puts "Exception: #{e.message}"
    puts "On url: #{@current_url}"
    puts e.backtrace
  end

  def get(url)
    @current_url = url
    @agent.get(url)
  end
end

c = Crawler.new

c.crawl do
  page = c.get("http://www.amazon.com")
  puts page.parser.xpath("//asldkfjasdlkj").first['href']
end
4

3 回答 3

0

我设法通过“超级”和占位符功能获得了所需的功能。还有更好的方法来做到这一点吗?

require 'mechanize'

class Crawler
  Mechanize.html_parser = Nokogiri::HTML

  def initialize
    @agent = Mechanize.new
  end

  def stuff
  end

  def crawl
    stuff
  rescue Exception => e
    puts "Exception: #{e.message}"
    puts "On url: #{@current_url}"
    puts e.backtrace
  end

  def get(url)
    @current_url = url
    @agent.get(url)
  end
end

class Amazon < Crawler
  #some code with errors
  def stuff
    super
    page = get("http://www.amazon.com")
    puts page.parser.xpath("//asldkfjasdlkj").first['href']
  end
end

a = Amazon.new
a.crawl
于 2012-04-06T22:53:10.507 回答
0

您可以抓取接受代码块:

def crawl
  begin
    yield
  rescue Exception => e
    # handle exceptoin
  end
end

def stuff
  crawl do
    # implementation of stuff
  end
end

我对没有实体的方法并不疯狂。代码块在这里可能更有意义。根据您想要做什么,还可以消除对子类化的需要。

于 2012-04-06T23:24:53.700 回答
0

如果您想换一种方式,请查看“策略”设计模式:

# test_mach.rb
require 'rubygems'
require 'mechanize'

# this is the context class,which calls the different strategy implementation
class Crawler
  def initialize(some_website_strategy)
    @strategy = some_website_strategy
  end

  def crawl
    begin
      @strategy.crawl
      #execute code in Amazon class here?
    rescue Exception => e
      puts "==== starts this exception comes from Parent Class"
      puts e.backtrace
      puts "==== ends  this exception comes from Parent Class"
    end
  end
end

# strategy class for Amazon
class Amazon
  def crawl
    puts "now crawling amazon"
    raise "oh ... some errors when crawling amazon"
  end
end

# strategy class for taobao.com 
class Taobao
  def crawl
    puts "now crawling taobao"
    raise "oh ... some errors when crawling taobao"
  end
end

然后运行此代码:

amazon = Crawler.new(Amazon.new)
amazon.crawl
taobao = Crawler.new(Taobao.new)
taobao.crawl

结果:

now crawling amazon
==== starts this exception comes from Parent Class
test_mach.rb:27:in `crawl'
test_mach.rb:13:in `crawl'
test_mach.rb:38
==== ends  this exception comes from Parent Class
now crawling taobao
==== starts this exception comes from Parent Class
test_mach.rb:34:in `crawl'
test_mach.rb:13:in `crawl'
test_mach.rb:40
==== ends  this exception comes from Parent Class

顺便提一句。对于您的实施,基本上我对您做了同样的事情。除了

# my implementation
class Crawler
  def stuff
    raise "abstract method called"
  end
end 

如果您想要另一种方式,请查看“围绕别名”(<< metaprogramming ruby​​>>,第 155 页)。但是我认为“围绕别名”是策略的还原案例。

( 我是说,

  • 在策略模式中,您首先实现“上下文”类,然后您需要实现“策略”类。
  • 但是在“围绕别名”法术中,您首先需要实现一个“策略”,然后编写一个“围绕别名”“策略”的新类......

错误...希望我没有让您感到困惑^_^)

于 2012-04-07T00:21:36.800 回答