90

我收到运行 unicorn 和 sidekiq 的 Heroku 应用程序的 R12 退出超时错误。这些错误每天发生 1-2 次,每当我部署时都会发生。我知道我需要将 Heroku 的关闭信号转换为 unicorn 才能正确响应,但我认为我已经在下面的 unicorn 配置中这样做了:

worker_processes 3
timeout 30
preload_app true

before_fork do |server, worker|
  Signal.trap 'TERM' do
    puts "Unicorn master intercepting TERM and sending myself QUIT instead. My PID is #{Process.pid}"
    Process.kill 'QUIT', Process.pid
  end

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
    Rails.logger.info('Disconnected from ActiveRecord')
  end
end

after_fork do |server, worker|
  Signal.trap 'TERM' do
    puts "Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is #{Process.pid}"
  end

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.establish_connection
    Rails.logger.info('Connected to ActiveRecord')
  end

  Sidekiq.configure_client do |config|
    config.redis = { :size => 1 }
  end
end

我围绕错误的日志如下所示:

Stopping all processes with SIGTERM
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 7
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 11
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 15
Unicorn master intercepting TERM and sending myself QUIT instead. My PID is 2
Started GET "/manage"
reaped #<Process::Status: pid 11 exit 0> worker=1
reaped #<Process::Status: pid 7 exit 0> worker=0
reaped #<Process::Status: pid 15 exit 0> worker=2
master complete
Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
Stopping remaining processes with SIGKILL
Process exited with status 137

似乎所有子进程在超时之前都已成功收割。有没有可能师父还活着?此外,路由器是否仍应在关闭期间向测功机发送 Web 请求,如日志中所示?

FWIW,我正在使用 Heroku 的零停机时间部署插件(https://devcenter.heroku.com/articles/labs-preboot/)。

4

1 回答 1

4

我认为您的自定义信号处理是导致这里超时的原因。

编辑:我因不同意 Heroku 的文档而被否决,我想解决这个问题。

将 Unicorn 应用程序配置为捕获并吞下 TERM 信号是导致应用程序挂起和未正确关闭的最可能原因。

Heroku 似乎认为捕获TERM信号并将其转换为QUIT信号是将硬关机转换为正常关机的正确行为。

然而,这样做似乎会在某些情况下引入根本不关机的风险——这个错误的根源。遇到悬挂式测功机运行 Unicorn 的用户应该考虑证据并根据第一原则做出自己的决定,而不仅仅是文档。

于 2013-12-01T18:50:40.703 回答