我正在尝试使用 Mechanize 运行本地 Ruby 脚本,该脚本将我登录到一个网站并浏览其大约 1500 个网页并解析每个网页的信息。解析有效,但仅在一定时间内有效;该脚本运行了大约 45 秒左右,然后它完全停止并报告:
/Users/myname/.rvm/gems/ruby-1.9.3-p374/gems/mechanize-2.7.1/lib/mechanize/http/agent.rb:306:in `fetch': 503 => Net::HTTPServiceUnavailable for http://example.com/page;53 -- unhandled response (Mechanize::ResponseCodeError)
我不能确定,但我觉得这是由于连接超时。我尝试在我的脚本中用很长的超时时间解决这个问题(这个脚本可能需要长达 15 分钟才能运行),但它仍然没有改变任何东西。如果您有任何想法,请告诉我。
这是我的脚本:
require 'mechanize'
require 'open-uri'
require 'rubygems'
agent = Mechanize.new
agent.open_timeout = 1000
agent.read_timeout = 1000
agent.max_history = 1
page = agent.get('examplesite.com')
myform = page.form_with(:action => '/maint')
myuserid_field = myform.field_with(:id => "username")
myuserid_field.value = 'myusername'
mypass_field = myform.field_with(:id => "password")
mypass_field.value = 'mypassword'
page = agent.submit(myform, myform.buttons.first)
urlArray = [giant array of webpages here]
urlArray.each do |term|
page = agent.get('' + term + '')
page.encoding = 'windows-1252'
puts agent.page.parser.xpath("//tr[4]/td[2]/textarea/text()").text + 'NEWLINEHERE'
end