5

我有一个 celery 任务,它应该在无限循环中运行,在 RabbitMQ 中监听一些队列(与 Celery 内部无关)。当从队列中检索到消息时,这个长时间运行的任务会分派此消息以由其他任务处理。

如何在 Celery 中适当地实现这样的用例?

我使用并发 3 和 Ofair 标志运行 celery。

我目前的观察是,几天后,此设置停止处理来自 celery 内部队列的任务。似乎由于某种原因正在重新启动这个长时间运行的任务,最终所有 3 个工作人员都只忙于这个长期运行的任务,因此没有工作人员可以处理 celery 队列中的任务。

我考虑了一些基于文件的锁来确保只有一个工作人员能够获得锁并成为这个长期运行的任务,但不确定它是否是一个好的选择,我认为这个问题有更好的解决方案。

def init_couriers_consumers(self):
    logger.info("lock acquired")
    logger.info("TASK ID: {}".format(init_couriers_consumers.request.id))
    with Connection('amqp://guest:guest@localhost:5672//') as conn:
        couriers_consumer_worker = ConsumerWorker(conn)
        couriers_consumer_worker.run()
        couriers_consumer_worker.should_stop = False
        # cache.set('reboot', False)
        self.retry(countdown=2)


class ConsumerWorker(ConsumerMixin):

    def __init__(self, connection):
        self.connection = connection
        self._create_queues()


    def _create_queues(self):
        from courier.models import Courier
        self.queues = []
        logger.info("create_queues")
        for courier in Courier.objects.filter(user__is_active=True):
            logger.info("create_queue for courier: {}".format(courier.user.username))
            self._create_courier_queues(courier.user.username)

    def _create_courier_queues(self, courier_username):
        self.queues.append(QueuesFactory.get_consumer_order_status_queue(courier_username))
        self.queues.append(QueuesFactory.get_consumer_status_queue(courier_username))
        self.queues.append(QueuesFactory.get_consumer_gps_queue(courier_username))

    def get_consumers(self, Consumer, channel):
        logger.info("Subscribing to queues: {}".format(str(self.queues)))
        return [Consumer(queues=self.queues,
                         callbacks=[self.process_message])]

    def process_message(self, body, message):
        logger.info("process message")
        from courier.api.tasks import process_message_task, error_handler_task
        process_message_task.apply_async((message.delivery_info['routing_key'], message.payload), link_error=error_handler_task.s())
        logger.info("after process message")
        message.ack()

    def on_connection_revived(self):
        logger.info("revived")

    def on_consume_ready(self, connection, channel, consumers, **kwargs):
        logger.info("on consumer ready")

    def on_consume_end(self, connection, channel):
        logger.info("on consume end")

    # def on_iteration(self):
    #     if cache.get('reboot'):
    #         logger.info("SHOULD STOP")
    #         self.should_stop = True
    #         release_lock()

重新启动后的日志:

[2016-11-14 15:47:36,652: INFO/MainProcess] Connected to amqp://guest:**@localhost:5672//
[2016-11-14 15:47:36,665: INFO/MainProcess] mingle: searching for neighbors
[2016-11-14 15:47:37,677: INFO/MainProcess] mingle: all alone
[2016-11-14 15:47:37,692: WARNING/MainProcess] celery@ip-178-216-202-251.e24cloud.com ready.
[2016-11-14 15:47:39,686: INFO/MainProcess] Received task: courier.api.consumers.init_couriers_consumers[couriers_consumer]
[2016-11-14 15:47:39,686: INFO/MainProcess] Received task: courier.api.consumers.init_producer_queues[91d7c307-8eed-4966-83ad-8b001e2459e5]
[2016-11-14 15:47:39,687: INFO/Worker-2] lock acquired
[2016-11-14 15:47:39,688: INFO/Worker-2] TASK ID: couriers_consumer
[2016-11-14 15:47:39,692: INFO/Worker-2] create_queues
[2016-11-14 15:47:40,308: INFO/Worker-2] create_queue for courier: courier1
[2016-11-14 15:47:40,322: INFO/Worker-2] revived
[2016-11-14 15:47:40,322: INFO/Worker-2] Connected to amqp://guest:**@localhost:5672//
[2016-11-14 15:47:40,325: INFO/Worker-2] Subscribing to queues: [<unbound Queue from/courier1/order/status -> <unbound Exchange couriers(direct)> -> from/courier1/order/status>, <unbound Queue from/courier1/status -> <unbound Exchange couriers(direct)> -> from/courier1/order/status>, <unbound Queue from/courier1/gps -> <unbound Exchange couriers(direct)> -> from/courier1/gps>]
[2016-11-14 15:47:40,333: INFO/Worker-2] on consumer ready
[2016-11-14 15:47:40,554: INFO/MainProcess] Task courier.api.consumers.init_producer_queues[91d7c307-8eed-4966-83ad-8b001e2459e5] succeeded in 0.864124746993s: None

但是几天后我看到了(grep 复活了)

[2016-11-13 05:35:09,502: INFO/Worker-1] revived
[2016-11-14 05:58:17,716: INFO/Worker-3] revived
[2016-11-14 12:33:25,774: INFO/Worker-2] revived

这可能意味着每个工人都在这个长期运行的任务中,但不确定这种状态是如何发生的。

4

0 回答 0