我有一个应用程序将负载分配给一群工人。到目前为止,所有工作人员都在同一个 VM 上运行,还不需要扩展。我的问题是,就像每 3-4 天一样,工作人员崩溃并显示以下错误消息 - 客户端和 rabbitmq 服务器在 1200 秒内没有联系(我猜)。
Traceback (most recent call last):
File "/var/www/vhosts/niklas/workers/builder.py", line 170, in <module>
BuildWorker().main()
File "/var/www/vhosts/niklas/lib/worker.py", line 29, in main
self.msgs.ch.start_consuming()
File "/usr/local/lib/python2.6/dist-packages/pika/adapters/blocking_connection.py", line 722, in start_consuming
self.connection.process_data_events()
File "/usr/local/lib/python2.6/dist-packages/pika/adapters/blocking_connection.py", line 93, in process_data_events
self.process_timeouts()
File "/usr/local/lib/python2.6/dist-packages/pika/adapters/blocking_connection.py", line 99, in process_timeouts
self._call_timeout_method(self._timeouts.pop(timeout_id))
File "/usr/local/lib/python2.6/dist-packages/pika/adapters/blocking_connection.py", line 164, in _call_timeout_method
timeout_value['method']()
File "/usr/local/lib/python2.6/dist-packages/pika/heartbeat.py", line 85, in send_and_check
return self._close_connection()
File "/usr/local/lib/python2.6/dist-packages/pika/heartbeat.py", line 106, in _close_connection
HeartbeatChecker._STALE_CONNECTION % duration)
File "/usr/local/lib/python2.6/dist-packages/pika/adapters/blocking_connection.py", line 75, in close
self.process_data_events()
File "/usr/local/lib/python2.6/dist-packages/pika/adapters/blocking_connection.py", line 91, in process_data_events
self._handle_timeout()
File "/usr/local/lib/python2.6/dist-packages/pika/adapters/blocking_connection.py", line 198, in _handle_timeout
self._on_connection_closed(None, True)
File "/usr/local/lib/python2.6/dist-packages/pika/adapters/blocking_connection.py", line 235, in _on_connection_closed
raise exceptions.AMQPConnectionError(*self.closing)
pika.exceptions.AMQPConnectionError: (320, 'Too Many Missed Heartbeats, No reply in 1200 seconds')
我的问题是,什么可能导致这种情况?这只发生在大约三分之一的工作人员身上,其他工作人员运行良好,没有任何错误消息或警告(同样,所有工作人员和 rabbitmq-server 在同一个 VM 上)。我正在使用 Python 库 pika 中的标准方法 start_sumption() 来检索新请求。这里的代码太大了,考虑到错误消息,它似乎超出了我的代码或系统问题。
我正在使用:
- Python 鼠兔 0.9.8
- 兔MQ 3.0.0
- Debian 6.0
- 所有工作人员都在屏幕内启动
- 虚拟机托管在 Linode,512MB 内存