ruby-on-rails - 独角兽下的大量线程

Question

我正在调试我们的应用程序中的一些 Posgtres 连接泄漏。几天前，我们突然超过了 100 个连接，而我们不应该这样做 - 因为我们只有 8 个独角兽工人和一个 sidekiq 进程（25 个线程）。

我今天在看 htop，看到我的独角兽工人产生了大量的线程。例如：

我读对了吗？这不应该发生对吗？如果这些是产生的线程，知道如何调试吗？

谢谢！顺便说一句，我的另一个问题-（Postgres 连接）调试独角兽 postgres 连接泄漏

编辑

我只是在这里遵循了一些提示 - http://varaneckas.com/blog/ruby-tracing-threads-unicorn/ - 当我从工作线程打印堆栈跟踪时，这就是我在有很多线程时得到的......

[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] -------------------

这是我的 unicorn.rb https://gist.github.com/steverob/b83e41bb49d78f9aa32f79136df5af5f，它在 after_fork 中为 EventMachine 生成了一个线程。

EventMachine 的原因是这样的——> https://github.com/keenlabs/keen-gem#asynchronous-publishing

这是正常的吗？线程不应该被杀死吗？这是否也会导致打开不必要的数据库连接？谢谢

更新： 我刚刚发现我使用的是使用 EM 的旧版本的 PubNub gem，我在 pubnub.log 文件中遇到了这些行 -

D, [2016-04-06T21:31:12.130123 #1573] DEBUG -- pubnub: Created event Pubnub::Publish
D, [2016-04-06T21:31:12.130144 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire
D, [2016-04-06T21:31:12.130162 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire | Adding event to async_events
D, [2016-04-06T21:31:12.130178 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire | Starting railgun
D, [2016-04-06T21:31:12.130194 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | starting EM in new thread
D, [2016-04-06T21:31:12.130243 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | We aren't running on thin
D, [2016-04-06T21:31:12.130264 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | EM already running

score 5 · Accepted Answer

因此，毕竟，在您的特定情况下，这种行为似乎是正常的。

您提供的 unicorn 线程堆栈跟踪（使用此方法获得）指向EventMachine中的spawn_threadpool方法。EventMachine 中的这段代码在其他一些代码调用时被调用EventMachine.defer，该方法在第一次调用时默认生成一个包含 20 个线程的池。EventMachine.defer我在旧版本的pubnubgem中找到了用法（例如这里），但它也可以在其他地方使用。

所以，我认为这解释了您在每个工人身上观察到的大量线程。他们主要在挂起线程的pop方法中等待，直到将某些内容推入队列（在 EventMachine 中再次延迟）。因此，除非您有大量的延迟操作，否则线程大多什么都不做。

如果您不需要在每个 unicorn worker 上为可延迟操作准备 20 个线程（很可能您不需要），您可以尝试通过将变量设置为某个合理的数字来降低池中的线程threadpoolsize数，例如：

EventMachine.threadpool_size = 5

我会把它放在after_fork独角兽配置中的某个地方。

此外，作为另一种选择，您可以考虑使用unicorn-worker-killer gem定期杀死独角兽的工人。

顺便说一句，pubnub吐到其日志中的消息似乎没问题，因为它只是告诉我们它找到了一个已经初始化的 EventMachine 线程，因此它不必启动一个新线程。此源代码阐明了这一点。

score 1 · Accepted Answer

Ran into this issue today with version 4. When using PubNub in a background worker the thread count would continue to climb until we got an error. The solution was as follows:

client = Pubnub.new(...)
client.publish(...)
client.telemetry.terminate

ruby-on-rails - 独角兽下的大量线程

2 回答 2

Related

Reference