1

我找到了一个我想使用的 gem,但我无法让它工作。该 gem 称为CobWeb。它看起来很光滑。

我已经让 redis 和 resque 工作了。我创建了一个旨在利用 CobWeb 的新队列。

class Crawler
  @queue = :crawler_queue
  def self.perform(site_id)
    site = Site.find_by_id(site_id)
    crawler = CobWeb.new(follow_redirects: false, internal_urls: true)
    crawler.start(site.homepage)

    puts crawler # I'm ultimately interested in getting a list of urls, but at this stage, I just want to see what data I get back from the crawler.
  end

问题是当我尝试为队列运行 rake taks 时,我收到此错误。我不确定如何解决这个问题。有什么建议么?

rake resque:work QUEUE='*' --trace
** Invoke resque:work (first_time)
** Invoke resque:preload (first_time)
** Invoke resque:setup (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute resque:setup
** Execute resque:preload
** Invoke resque:setup 
** Execute resque:work
rake aborted!
tried to create Proc object without a block
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/sinatra-1.3.2/lib/sinatra/base.rb:1197:in `define_method'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/sinatra-1.3.2/lib/sinatra/base.rb:1197:in `generate_method'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/sinatra-1.3.2/lib/sinatra/base.rb:1206:in `compile!'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/sinatra-1.3.2/lib/sinatra/base.rb:1186:in `route'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/sinatra-1.3.2/lib/sinatra/base.rb:1168:in `get'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/sinatra-1.3.2/lib/sinatra/base.rb:1602:in `block (2 levels) in delegate'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/redis-namespace-1.2.1/lib/redis/namespace.rb:257:in `method_missing'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/resque-1.21.0/lib/resque/worker.rb:444:in `job'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/resque-1.21.0/lib/resque/worker.rb:377:in `unregister_worker'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/resque-1.21.0/lib/resque/worker.rb:159:in `ensure in work'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/resque-1.21.0/lib/resque/worker.rb:159:in `work'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/gems/resque-1.21.0/lib/resque/tasks.rb:34:in `block (2 levels) in <top (required)>'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `call'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `block in execute'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `each'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `execute'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/task.rb:158:in `block in invoke_with_call_chain'
/Users/bendowney/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/task.rb:151:in `invoke_with_call_chain'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/task.rb:144:in `invoke'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/application.rb:116:in `invoke_task'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `block (2 levels) in top_level'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `each'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `block in top_level'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/application.rb:133:in `standard_exception_handling'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/application.rb:88:in `top_level'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/application.rb:66:in `block in run'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/application.rb:133:in `standard_exception_handling'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/lib/rake/application.rb:63:in `run'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/gems/rake-0.9.2.2/bin/rake:33:in `<top (required)>'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/bin/rake:19:in `load'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@global/bin/rake:19:in `<main>'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/bin/ruby_noexec_wrapper:14:in `eval'
/Users/bendowney/.rvm/gems/ruby-1.9.3-p194@RobotWatcherApp/bin/ruby_noexec_wrapper:14:in `<main>'
Tasks: TOP => resque:work
4

1 回答 1

1

似乎无法准确确定这一点,但是,我刚刚发布了新版本的蜘蛛网 v0.0.64,并创建了一个新的 rails 应用程序,并运行了示例代码。

您可以在http://github.com/stewartmckee/cobweb_sample获取示例应用程序

我在上面看到的唯一完全错误的是 :internal_urls 应该是一个数组,例如

internal_urls:[“http://www.google.com/folder1/ ”、“http://www.google.com/folder2/ ”、“http://www.otherdomain.com/*”]

这将只允许处理具有这些 url 模式的 url。

查看示例站点并尝试在您的环境中运行它以确保它不是环境问题。

斯图尔特。

于 2012-08-20T01:04:27.857 回答