在运行 Sidekiq 时,我看到大量作业失败,这些作业都与我的 Mongo 数据库的连接问题有关。我正在用大量负载对机器进行压力测试,因此我排队了超过 18,000 个作业,每个作业在失败时重试 5 秒。一些工作(我猜是那些能够成功检索连接线程的工作人员)工作得很好。然后我有很多其他人有这样的错误:
2013-12-26T19:25:35Z 13447 TID-m2biw WARN: {"retry"=>true, "queue"=>"default",
"class"=>"ParseFileWorker", "args"=>[{"filename"=>"/opt/media/AE-67452_36.XML"}],
"jid"=>"5d6c48fa94ed8c8da1b226fc", "enqueued_at"=>1388084903.6113343,
"error_message"=>"Could not connect to a primary node for replica set #
<Moped::Cluster:44552140 @seeds=[<Moped::Node resolved_address=\"127.0.0.1:27017\">]>",
"error_class"=>"Moped::Errors::ConnectionFailure", "failed_at"=>"2013-12-26T19:16:25Z",
"retry_count"=>3, "retried_at"=>2013-12-26 19:25:35 UTC}
还有超时错误,如下所示:
Timeout::Error: Waited 0.5 sec
请注意,我正在运行带有从https://github.com/mongoid/mongoid的主分支签出的 Mongoid 代码的 Rails 4 。根据我的阅读,以前版本的 Mongoid 需要在完成 Sidekiq 作业处理时显式关闭连接。Mongoid 4 应该自动执行此操作。我无法确认它是否正在这样做。当作业排队太快连接不可用或超时时,问题似乎是双重的。一些工人成功地打开了连接。有些作业必须等到重试才能解析。
我不完全确定这里最好的解决方案是什么。如果我慢慢地排队工作,一切都会很好。不幸的是,这不是应用程序在现实世界中的运作方式。
我如何确保我的 sidekiq 工作获得完成工作所需的 Moped/Mongoid 连接,或者等到有一个可以处理?由于这个问题,看到了大量的异常/错误。下面是常见的重复错误的堆栈跟踪:
2013-12-26T20:56:58Z 1317 TID-i70ms ParseFileWorker JID-0fc375d8fd9707e7d3ec3238 INFO: fail: 1.507 sec
2013-12-26T20:56:58Z 1317 TID-i70ms WARN: {"retry"=>true, "queue"=>"default", "class"=>"ParseFileWorker", "args"=>[{"filename"=>"/opt/media/AC-19269_287.XML"}], "jid"=>"0fc375d8fd9707e7d3ec3238", "enqueued_at"=>1388091410.0421875, "error_message"=>"Waited 0.5 sec", "error_class"=>"Timeout::Error", "failed_at"=>2013-12-26 20:56:58 UTC, "retry_count"=>0}
2013-12-26T20:56:58Z 1317 TID-i70ms WARN: Waited 0.5 sec
2013-12-26T20:56:58Z 1317 TID-i70ms WARN: /media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool/timed_stack.rb:35:in `block (2 levels) in pop'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool/timed_stack.rb:31:in `loop'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool/timed_stack.rb:31:in `block in pop'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool/timed_stack.rb:30:in `synchronize'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool/timed_stack.rb:30:in `pop'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool.rb:66:in `checkout'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool.rb:53:in `with'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/node.rb:114:in `connection'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/node.rb:141:in `disconnect'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/node.rb:156:in `down!'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/node.rb:442:in `rescue in refresh'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/node.rb:431:in `refresh'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/cluster.rb:182:in `block in refresh'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/cluster.rb:194:in `each'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/cluster.rb:194:in `refresh'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/cluster.rb:151:in `nodes'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/cluster.rb:240:in `with_primary'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/read_preference/primary.rb:55:in `block in with_node'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/read_preference/selectable.rb:65:in `call'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/read_preference/selectable.rb:65:in `with_retry'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/read_preference/primary.rb:54:in `with_node'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/query.rb:127:in `first'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual/mongo.rb:201:in `block (2 levels) in first'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual/mongo.rb:537:in `with_sorting'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual/mongo.rb:200:in `block in first'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual/mongo.rb:449:in `try_cache'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual/mongo.rb:199:in `first'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual.rb:19:in `first'
/media/apps/tsn-parser-server/app/models/parser/core.rb:88:in `prepare_parse'
** 更新 **
如果我将 sidekiq 工作人员的数量减少到 10 个左右,那么问题就不会发生了——仅仅是因为只有 10 个线程在运行的情况下打开与 MongoDB 的连接的工作人员较少。必须有一种方法可以在 sidekiq 中为 Mongoid/Moped 使用连接池。由于 Rails 负责建立和断开连接,我不知道如何实现这一点。
** 更新 2 **
当我把它指向我们的生产服务器时,它变得更糟了 10 倍。几乎每个线程都是一个Timeout::Error
. 有些人通过了,但根本没有多少。这太可怕了。当 MongoDB 在本地时,至少可以通过将 sidekiq 限制为 10 个工作人员来“解决”这个问题。但这不是一个现实的用例,我需要能够连接到另一台机器上的数据库。