我为我的一个客户管理一个 Rails 应用程序,最近它出现了故障。在我注意到之前,该网站已关闭 9 小时。我检查了日志,过去 9 小时的每个请求都带有以下代码:
at=error code=H10 desc="App crashed"
在此之前,我看到以下日志:
2012-11-16T00:55:46+00:00 heroku[web.1]: Idling
2012-11-16T00:55:50+00:00 heroku[web.1]: Stopping all processes with SIGTERM
2012-11-16T00:55:51+00:00 app[web.1]: [2012-11-16 00:55:51] ERROR SignalException: SIGTERM
2012-11-16T00:55:51+00:00 app[web.1]: /usr/local/lib/ruby/1.9.1/webrick/server.rb:90:in `select'
2012-11-16T00:56:00+00:00 heroku[web.1]: Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
2012-11-16T00:56:00+00:00 heroku[web.1]: Stopping remaining processes with SIGKILL
2012-11-16T00:56:02+00:00 heroku[web.1]: State changed from up to down
2012-11-16T00:56:02+00:00 heroku[web.1]: Process exited with status 137
2012-11-16T01:03:55+00:00 heroku[web.1]: Unidling
2012-11-16T01:03:55+00:00 heroku[web.1]: State changed from down to starting
2012-11-16T01:03:59+00:00 heroku[web.1]: Starting process with command `bundle exec rails server -p 4303`
2012-11-16T01:04:00+00:00 heroku[nginx]: 98.139.241.251 - - [16/Nov/2012:01:04:00 +0000] "GET / HTTP/1.1" 499 0 "-" "YahooCacheSystem" domain.com
2012-11-16T01:04:22+00:00 app[web.1]: => Ctrl-C to shutdown server
2012-11-16T01:04:22+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:21 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : Dispatcher: webrick
2012-11-16T01:04:22+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:21 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : Application: acsolar
2012-11-16T01:04:22+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:21 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : New Relic Ruby Agent 3.4.0.1 Initialized: pid = 2
2012-11-16T01:04:22+00:00 app[web.1]: => Booting WEBrick
2012-11-16T01:04:22+00:00 app[web.1]: => Rails 3.1.1 application starting in production on http://0.0.0.0:4303
2012-11-16T01:04:22+00:00 app[web.1]: => Call with -d to detach
2012-11-16T01:04:25+00:00 app[web.1]: [DEPRECATION] Your applications public directory contains an assets/products and/or assets/taxons subdirectory.
2012-11-16T01:04:25+00:00 app[web.1]: Run `rake spree:assets:relocate_images` to relocate the images.
2012-11-16T01:04:34+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:32 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : Reporting performance data every 60 seconds.
2012-11-16T01:04:34+00:00 app[web.1]: Connected to NewRelic Service at collector-5.newrelic.com
2012-11-16T01:05:00+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch
2012-11-16T01:05:00+00:00 heroku[web.1]: Stopping process with SIGKILL
2012-11-16T01:05:02+00:00 heroku[web.1]: Process exited with status 137
2012-11-16T01:05:02+00:00 heroku[web.1]: State changed from crashed to down
2012-11-16T01:05:02+00:00 heroku[web.1]: State changed from starting to crashed
我猜它可能已经旋转并在启动备份时出错,但是它为什么会一直处于崩溃状态而没有重新启动呢?如果将来再次发生这种情况,我能做些什么让它自动重启?
我也让 NewRelic 在上面运行,它根本没有通知我,但这是我必须调查的另一个问题。