8

我为我的一个客户管理一个 Rails 应用程序,最近它出现了故障。在我注意到之前,该网站已关闭 9 小时。我检查了日志,过去 9 小时的每个请求都带有以下代码:

at=error code=H10 desc="App crashed"

在此之前,我看到以下日志:

2012-11-16T00:55:46+00:00 heroku[web.1]: Idling
2012-11-16T00:55:50+00:00 heroku[web.1]: Stopping all processes with SIGTERM
2012-11-16T00:55:51+00:00 app[web.1]: [2012-11-16 00:55:51] ERROR SignalException: SIGTERM
2012-11-16T00:55:51+00:00 app[web.1]:   /usr/local/lib/ruby/1.9.1/webrick/server.rb:90:in `select'
2012-11-16T00:56:00+00:00 heroku[web.1]: Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
2012-11-16T00:56:00+00:00 heroku[web.1]: Stopping remaining processes with SIGKILL
2012-11-16T00:56:02+00:00 heroku[web.1]: State changed from up to down
2012-11-16T00:56:02+00:00 heroku[web.1]: Process exited with status 137
2012-11-16T01:03:55+00:00 heroku[web.1]: Unidling
2012-11-16T01:03:55+00:00 heroku[web.1]: State changed from down to starting
2012-11-16T01:03:59+00:00 heroku[web.1]: Starting process with command `bundle exec rails server -p 4303`
2012-11-16T01:04:00+00:00 heroku[nginx]: 98.139.241.251 - - [16/Nov/2012:01:04:00 +0000] "GET / HTTP/1.1" 499 0 "-" "YahooCacheSystem" domain.com
2012-11-16T01:04:22+00:00 app[web.1]: => Ctrl-C to shutdown server
2012-11-16T01:04:22+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:21 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : Dispatcher: webrick
2012-11-16T01:04:22+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:21 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : Application: acsolar
2012-11-16T01:04:22+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:21 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : New Relic Ruby Agent 3.4.0.1 Initialized: pid = 2
2012-11-16T01:04:22+00:00 app[web.1]: => Booting WEBrick
2012-11-16T01:04:22+00:00 app[web.1]: => Rails 3.1.1 application starting in production on http://0.0.0.0:4303
2012-11-16T01:04:22+00:00 app[web.1]: => Call with -d to detach
2012-11-16T01:04:25+00:00 app[web.1]: [DEPRECATION] Your applications public directory contains an assets/products and/or assets/taxons subdirectory. 
2012-11-16T01:04:25+00:00 app[web.1]:     Run `rake spree:assets:relocate_images` to relocate the images.
2012-11-16T01:04:34+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:32 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : Reporting performance data every 60 seconds.
2012-11-16T01:04:34+00:00 app[web.1]: Connected to NewRelic Service at collector-5.newrelic.com
2012-11-16T01:05:00+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch
2012-11-16T01:05:00+00:00 heroku[web.1]: Stopping process with SIGKILL
2012-11-16T01:05:02+00:00 heroku[web.1]: Process exited with status 137
2012-11-16T01:05:02+00:00 heroku[web.1]: State changed from crashed to down
2012-11-16T01:05:02+00:00 heroku[web.1]: State changed from starting to crashed

我猜它可能已经旋转并在启动备份时出错,但是它为什么会一直处于崩溃状态而没有重新启动呢?如果将来再次发生这种情况,我能做些什么让它自动重启?

我也让 NewRelic 在上面运行,它根本没有通知我,但这是我必须调查的另一个问题。

4

1 回答 1

4

Heroku 的支持答案建议使用heroku restart. 他们现在正在解决问题。

嗨,我们这边的进程管理错误导致一些崩溃的应用程序只运行 1 个 web dyno 被报告为“空闲”,即使它们实际上已经崩溃了。这意味着崩溃的测功机从未重新启动,导致后续请求失败。我们已发现此问题并正在实施修复。如果您的应用仍然没有响应,请尝试使用 heroku restart 命令重新启动它。如果您需要更多帮助,请告诉我们。谢谢,Heroku 支持

于 2012-12-01T14:09:32.567 回答