2

当 MainProcess 失去与代理的连接时,我们遇到了 Celery 的奇怪问题。Celery 生成以下错误,然后开始使用 100% CPU。工人们仍将继续正常工作。我可以看到 RabbitMQ 认为连接超时。更新到 celery 3 后,我们经常遇到这些错误。

我感觉它与非阻塞消息消耗有关,但我在理解代码方面并没有真正取得进展。

有什么方法可以更早地检测到这些或防止 celery 使用 100% CPU?

  • 芹菜:3.0.4
  • AMQP:1.0.10
  • 兔子MQ:2.8.4

时间戳相隔 2 小时,因为 RabbitMQ 报告 GMT 和 celery 本地时间。

芹菜错误

[2013-05-09 18:20:20,204: ERROR/MainProcess] Consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 369, in start
    self.consume_messages()
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 450, in consume_messages
    readers[fileno](fileno, event)
  File "/usr/local/lib/python2.7/site-packages/kombu/connection.py", line 290, in drain_nowait
    self.drain_events(timeout=0)
  File "/usr/local/lib/python2.7/site-packages/kombu/connection.py", line 279, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 91, in drain_events
    return connection.drain_events(**kwargs)
  File "/usr/local/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events
    chanmap, None, timeout=timeout,
  File "/usr/local/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple
    channel, method_sig, args, content = read_timeout(timeout)
  File "/usr/local/lib/python2.7/site-packages/amqp/connection.py", line 299, in read_timeout
    return self.method_reader.read_method()
  File "/usr/local/lib/python2.7/site-packages/amqp/method_framing.py", line 187, in read_method
    raise m
IOError: Socket closed

RabbitMQ 错误

=ERROR REPORT==== 9-May-2013::16:20:20 ===
closing AMQP connection <0.1813.0> (192.168.201.104:12809 -> 192.168.201.104:5672):
{timeout,running}
4

1 回答 1

2

我关闭了似乎已解决问题的经纪人心跳。但是,我不太确定,因为我没有办法重现该错误。

于 2013-05-16T06:55:42.080 回答