3

当我尝试重试失败的任务时,我间歇性地(大约 20% 的时间)从 Celery 收到 IOError 异常。

这是我的任务:

@task
def update_data(pk_id):
     try:
        pk = PK.objects.get(pk=pk_id)
        results = pk.get_update()
        return results
    except urllib2.HTTPError, exc:
        print "Let's retry in a few minutes."
        update_data.retry(exc=exc, countdown=600)

例外:

[2011-10-07 11:35:53,594: ERROR/MainProcess] Task report.tasks.update_data[1babd4e3-45eb-4fa3-a497-68b67bb4a6df] raised exception: IOError()
Traceback (most recent call last):
  File "/home/prj/prj_env/lib/python2.6/site-packages/celery/execute/trace.py", line 36, in trace
    return cls(states.SUCCESS, retval=fun(*args, **kwargs))
  File "/home/prj/prj_env/lib/python2.6/site-packages/celery/app/task/__init__.py", line 232, in __call__
    return self.run(*args, **kwargs)
  File "/home/prj/prj_env/lib/python2.6/site-packages/celery/app/__init__.py", line 172, in run
    return fun(*args, **kwargs)
  File "/home/prj/prj/report/tasks.py", line 109, in update_data
    update_data.retry(exc=exc, countdown=600)
  File "/home/prj/prj_env/lib/python2.6/site-packages/celery/app/task/__init__.py", line 520, in retry
    self.name, options["task_id"], args, kwargs))
HTTPError

RabbitMQ 日志

=INFO REPORT==== 7-Oct-2011::15:35:43 ===
closing TCP connection <0.4294.17> from 10.254.122.225:59704

=WARNING REPORT==== 7-Oct-2011::15:35:43 ===
exception on TCP connection <0.4330.17> from 10.254.122.225:59715
connection_closed_abruptly

=INFO REPORT==== 7-Oct-2011::15:35:43 ===
closing TCP connection <0.4330.17> from 10.254.122.225:59715

=WARNING REPORT==== 7-Oct-2011::15:35:49 ===
exception on TCP connection <0.4313.17> from 10.254.122.225:59709
connection_closed_abruptly

=INFO REPORT==== 7-Oct-2011::15:35:49 ===
closing TCP connection <0.4313.17> from 10.254.122.225:59709

=WARNING REPORT==== 7-Oct-2011::15:35:49 ===
exception on TCP connection <0.4350.17> from 10.254.122.225:59720
connection_closed_abruptly

=INFO REPORT==== 7-Oct-2011::15:35:49 ===
closing TCP connection <0.4350.17> from 10.254.122.225:59720

=INFO REPORT==== 7-Oct-2011::15:36:22 ===
accepted TCP connection on [::]:5672 from 10.255.199.63:50526

=INFO REPORT==== 7-Oct-2011::15:36:22 ===
starting TCP connection <0.4501.17> from 10.255.199.63:50526

任何想法为什么会发生这种情况?

谢谢!

4

2 回答 2

0

如果一段时间没有收到结果,可以将每个任务保存在数据库中并重试它们吗?或者可能是调度员有它自己的持久存储?如果工作线程在接收任务或执行任务时崩溃怎么办?

重试丢失或失败的任务(Celery、Django 和 RabbitMQ)

于 2011-11-21T12:52:36.180 回答
0

celery 中的max_retries默认为 3,因此如果同一任务连续失败 3 次(即 20% 的时间),重试将重新抛出异常。

于 2011-11-21T20:44:58.407 回答