ruby-on-rails-3 - nginx + Unicorn（Rails 3 应用程序）加载时出现网关错误

Question

我有一个在云平台上运行在 nginx 和独角兽上的 Rails (3.2) 应用程序。“盒子”在 Ubuntu 12.04 上运行。

当系统负载大约 70% 或更高时，nginx 突然（并且看似随机）开始抛出 502 Bad gateway errors；当负载较少时，没有什么喜欢的。我已经尝试过不同数量的内核（4、6、10 - 我可以“更改硬件”，因为它在云平台上），情况总是一样的。（CPU 负载类似于系统负载，用户空间为 55%，其余为系统和被盗，有大量可用内存，没有交换。）

502 通常分批出现，但并非总是如此。

（我每个核心运行一个 unicorn worker，一个或两个 nginx worker。在 10 个核心上运行时，请参阅下面配置的相关部分。）

我真的不知道如何跟踪这些错误的原因。我怀疑这可能与独角兽工人无法（及时？）服务有关，但这看起来很奇怪，因为他们似乎没有使 CPU 饱和，我认为他们没有理由等待 IO（但我不也不知道如何确保这一点）。

请您帮我解决如何寻找原因？

独角兽配置（unicorn.rb）：

worker_processes 10
working_directory "/var/www/app/current"
listen "/var/www/app/current/tmp/sockets/unicorn.sock", :backlog => 64
listen 2007, :tcp_nopush => true
timeout 90
pid "/var/www/app/current/tmp/pids/unicorn.pid"
stderr_path "/var/www/app/shared/log/unicorn.stderr.log"
stdout_path "/var/www/app/shared/log/unicorn.stdout.log"
preload_app true
GC.respond_to?(:copy_on_write_friendly=) and
  GC.copy_on_write_friendly = true
check_client_connection false

before_fork do |server, worker|
  ... I believe the stuff here is irrelevant ...
end
after_fork do |server, worker|
  ... I believe the stuff here is irrelevant ...
end

和 ngnix 配置：

/etc/nginx/nginx.conf：

worker_processes 2;
worker_rlimit_nofile 2048;
user www-data www-admin;
pid /var/run/nginx.pid;
error_log /var/log/nginx/nginx.error.log info;

events {
  worker_connections 2048;
  accept_mutex on; # "on" if nginx worker_processes > 1
  use epoll;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';
    access_log  /var/log/nginx/access.log  main;
    # optimialization efforts
    client_max_body_size        2m;
    client_body_buffer_size     128k;
    client_header_buffer_size   4k;
    large_client_header_buffers 10 4k;  # one for each core or one for each unicorn worker?
    client_body_temp_path       /tmp/nginx/client_body_temp;

    include /etc/nginx/conf.d/*.conf;
}

/etc/nginx/conf.d/app.conf：

sendfile on;
tcp_nopush on;
tcp_nodelay off;
gzip on;
gzip_http_version 1.0;
gzip_proxied any;
gzip_min_length 500;
gzip_disable "MSIE [1-6]\.";
gzip_types text/plain text/css text/javascript application/x-javascript;

upstream app_server {
  # fail_timeout=0 means we always retry an upstream even if it failed
  # to return a good HTTP response (in case the Unicorn master nukes a
  # single worker for timing out).
  server unix:/var/www/app/current/tmp/sockets/unicorn.sock fail_timeout=0;
}

server {
  listen 80 default deferred;
  server_name _;
  client_max_body_size 1G;
  keepalive_timeout 5;
  root /var/www/app/current/public;

  location ~ "^/assets/.*" {
      ...
  }

  # Prefer to serve static files directly from nginx to avoid unnecessary
  # data copies from the application server.
  try_files $uri/index.html $uri.html $uri @app;

  location @app {
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_redirect off;

    proxy_pass http://app_server;

    proxy_connect_timeout      90;
    proxy_send_timeout         90;
    proxy_read_timeout         90;

    proxy_buffer_size          128k;
    proxy_buffers              10 256k;  # one per core or one per unicorn worker?
    proxy_busy_buffers_size    256k;
    proxy_temp_file_write_size 256k;
    proxy_max_temp_file_size   512k;
    proxy_temp_path            /mnt/data/tmp/nginx/proxy_temp;

    open_file_cache max=1000 inactive=20s; 
    open_file_cache_valid    30s; 
    open_file_cache_min_uses 2;
    open_file_cache_errors   on;
  }
}

score 22 · Accepted Answer

在谷歌搜索在 nginx 错误日志中找到的表达式后，它原来是一个与 nginx 无关的已知问题，与 unicorn 几乎没有关系，并且植根于操作系统（linux）设置。

问题的核心是socket backlog太短了。这应该是多少有各种考虑（您是想尽快检测集群成员故障还是让应用程序推动负载限制）。但无论如何，listen :backlog都需要调整。

我发现在我的情况下 alisten ... :backlog => 2048就足够了。（我没有做太多的实验，虽然如果你愿意的话有一个很好的技巧，通过有两个套接字在 nginx 和 unicorn 之间进行通信，具有不同的积压和更长的备份；然后在 nginx 日志中查看较短队列失败的频率.) 请注意，这不是科学计算和 YMMV 的结果。

但是请注意，许多操作系统（大多数 linux 发行版，包括 Ubuntu 12.04）对套接字积压大小的操作系统级别默认限制要低得多（低至 128）。

您可以按如下方式更改操作系统限制（作为 root）：

sysctl -w net.core.somaxconn=2048
sysctl -w net.core.netdev_max_backlog=2048

添加这些以/etc/sysctl.conf使更改永久化。（/etc/sysctl.conf可以在不重新启动的情况下重新加载sysctl -p。）

有人提到您可能还必须增加进程可以打开的文件的最大数量（使用ulimit -n和/etc/security/limits.conf永久）。由于其他原因，我已经这样做了，所以我不知道它是否有所作为。

ruby-on-rails-3 - nginx + Unicorn（Rails 3 应用程序）加载时出现网关错误

1 回答 1

Related

Reference