3

我正在调试我们的应用程序中的一些 Posgtres 连接泄漏。几天前,我们突然超过了 100 个连接,而我们不应该这样做 - 因为我们只有 8 个独角兽工人和一个 sidekiq 进程(25 个线程)。

我今天在看 htop,看到我的独角兽工人产生了大量的线程。例如:

在此处输入图像描述 我读对了吗?这不应该发生对吗?如果这些是产生的线程,知道如何调试吗?

谢谢!顺便说一句,我的另一个问题-(Postgres 连接)调试独角兽 postgres 连接泄漏

编辑

我只是在这里遵循了一些提示 - http://varaneckas.com/blog/ruby-tracing-threads-unicorn/ - 当我从工作线程打印堆栈跟踪时,这就是我在有很多线程时得到的......

[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] -------------------

这是我的 unicorn.rb https://gist.github.com/steverob/b83e41bb49d78f9aa32f79136df5af5f,它在 after_fork 中为 EventMachine 生成了一个线程。

EventMachine 的原因是这样的——> https://github.com/keenlabs/keen-gem#asynchronous-publishing

这是正常的吗?线程不应该被杀死吗?这是否也会导致打开不必要的数据库连接?谢谢

更新: 我刚刚发现我使用的是使用 EM 的旧版本的 PubNub gem,我在 pubnub.log 文件中遇到了这些行 -

D, [2016-04-06T21:31:12.130123 #1573] DEBUG -- pubnub: Created event Pubnub::Publish
D, [2016-04-06T21:31:12.130144 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire
D, [2016-04-06T21:31:12.130162 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire | Adding event to async_events
D, [2016-04-06T21:31:12.130178 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire | Starting railgun
D, [2016-04-06T21:31:12.130194 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | starting EM in new thread
D, [2016-04-06T21:31:12.130243 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | We aren't running on thin
D, [2016-04-06T21:31:12.130264 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | EM already running
4

2 回答 2

5

因此,毕竟,在您的特定情况下,这种行为似乎是正常的。

您提供的 unicorn 线程堆栈跟踪(使用此方法获得)指向EventMachine中spawn_threadpool方法。EventMachine 中的这段代码在其他一些代码调用时被调用EventMachine.defer,该方法在第一次调用时默认生成一个包含 20 个线程的池。EventMachine.defer我在旧版本的pubnubgem中找到了用法(例如这里),但它也可以在其他地方使用。

所以,我认为这解释了您在每个工人身上观察到的大量线程。他们主要在挂起线程的pop方法中等待,直到将某些内容推入队列(在 EventMachine 中再次延迟)。因此,除非您有大量的延迟操作,否则线程大多什么都不做。

如果您不需要在每个 unicorn worker 上为可延迟操作准备 20 个线程(很可能您不需要),您可以尝试通过将变量设置为某个合理的数字来降低池中的线程threadpoolsize数,例如:

EventMachine.threadpool_size = 5

我会把它放在after_fork独角兽配置中的某个地方。

此外,作为另一种选择,您可以考虑使用unicorn-worker-killer gem定期杀死独角兽的工人。

顺便说一句,pubnub吐到其日志中的消息似乎没问题,因为它只是告诉我们它找到了一个已经初始化的 EventMachine 线程,因此它不必启动一个新线程。此源代码阐明了这一点。

于 2016-04-09T07:06:17.060 回答
1

Ran into this issue today with version 4. When using PubNub in a background worker the thread count would continue to climb until we got an error. The solution was as follows:

client = Pubnub.new(...)
client.publish(...)
client.telemetry.terminate
于 2019-11-25T22:05:15.883 回答