这是我的完整跟踪:
Traceback (most recent call last):
File "/home/server/backend/venv/lib/python3.4/site-packages/celery/app/trace.py", line 283, in trace_task
uuid, retval, SUCCESS, request=task_request,
File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 256, in store_result
request=request, **kwargs)
File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 490, in _store_result
self.set(self.get_key_for_task(task_id), self.encode(meta))
File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 160, in set
return self.ensure(self._set, (key, value), **retry_policy)
File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 149, in ensure
**retry_policy
File "/home/server/backend/venv/lib/python3.4/site-packages/kombu/utils/__init__.py", line 243, in retry_over_time
return fun(*args, **kwargs)
File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 169, in _set
pipe.execute()
File "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2593, in execute
return execute(conn, stack, raise_on_error)
File "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2447, in _execute_transaction
connection.send_packed_command(all_cmds)
File "/home/server/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 532, in send_packed_command
self.connect()
File "/home/pserver/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 436, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 0 connecting to localhost:6379. Error.
[2016-09-21 10:47:18,814: WARNING/Worker-747] Data collector is not contactable. This can be because of a network issue or because of the data collector being restarted. In the event that contact cannot be made after a period of time then please report this problem to New Relic support for further investigation. The error raised was ConnectionError(ProtocolError('Connection aborted.', BlockingIOError(11, 'Resource temporarily unavailable')),).
我真的搜索了 ConnectionError 但我的没有匹配问题。
我的平台是 ubuntu 14.04。这是我的 redis 配置的一部分。(如果您需要整个 redis.conf 文件,我可以分享。顺便说一下,所有参数都在 LIMITS 部分关闭。)
# By default Redis listens for connections from all the network interfaces
# available on the server. It is possible to listen to just one or multiple
# interfaces using the "bind" configuration directive, followed by one or
# more IP addresses.
#
# Examples:
#
# bind 192.168.1.100 10.0.0.1
bind 127.0.0.1
# Specify the path for the unix socket that will be used to listen for
# incoming connections. There is no default, so Redis will not listen
# on a unix socket when not specified.
#
# unixsocket /var/run/redis/redis.sock
# unixsocketperm 755
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0
# TCP keepalive.
#
# If non-zero, use SO_KEEPALIVE to send TCP ACKs to clients in absence
# of communication. This is useful for two reasons:
#
# 1) Detect dead peers.
# 2) Take the connection alive from the point of view of network
# equipment in the middle.
#
# On Linux, the specified value (in seconds) is the period used to send ACKs.
# Note that to close the connection the double of the time is needed.
# On other kernels the period depends on the kernel configuration.
#
# A reasonable value for this option is 60 seconds.
tcp-keepalive 60
这是我的迷你 redis 包装器:
import redis
from django.conf import settings
REDIS_POOL = redis.ConnectionPool(host=settings.REDIS_HOST, port=settings.REDIS_PORT)
def get_redis_server():
return redis.Redis(connection_pool=REDIS_POOL)
这就是我使用它的方式:
from redis_wrapper import get_redis_server
# view and task are working in different, indipendent processes
def sample_view(request):
rs = get_redis_server()
# some get-set stuff with redis
@shared_task
def sample_celery_task():
rs = get_redis_server()
# some get-set stuff with redis
软件包版本:
celery==3.1.18
django-celery==3.1.16
kombu==3.0.26
redis==2.10.3
所以问题是;在启动 celery workers 一段时间后会发生此连接错误。在第一次出现该错误之后,所有任务都以该错误结束,直到我重新启动所有芹菜工人。(有趣的是,芹菜花在那个问题时期也失败了)
我怀疑我的 redis 连接池使用方法,或者 redis 配置或者不太可能的网络问题。关于原因的任何想法?我究竟做错了什么?
(PS:今天看到这个错误时,我会添加redis-cli info结果)
更新:
我通过在我的 worker starter 命令中添加--maxtasksperchild参数暂时解决了这个问题。我设置为200。当然这不是解决这个问题的正确方法,它只是对症治疗。它基本上会定期刷新工作实例(关闭旧进程并在旧进程达到 200 个任务时创建新进程)并刷新我的全局 redis 池和连接。所以我认为我应该专注于全球redis连接池的使用方式,我还在等待新的想法和评论。
对不起我的英语不好,提前谢谢。