7

完全加载的多租户 Django 应用程序,具有 1000 个使用 Daphne/Channels 的 WebSockets,运行良好几个月,突然租户都称它为应用程序运行缓慢或完全挂起的支持线。将其缩小到 WebSockets,因为 HTTP REST API 命中快速且无错误。

没有任何应用程序日志或操作系统日志表明存在某些问题,因此唯一要做的就是下面提到的异常。它在两天内一次又一次地发生在这里和那里。

我不希望有任何深入的调试帮助,只是一些关于可能性的即兴建议。

AWS Linux 1
Python 3.6.4
Elasticache Redis 5.0
channels==2.4.0
channels-redis==2.4.2
daphne==2.5.0
Django==2.2.13

拆分配置HTTP服务于uwsgi,daphne服务于asgi,Nginx

May 10 08:08:16 prod-b-web1: [pid 15053] [version 119.5.10.5086] [tenant_id -] [domain_name -] [pathname /opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/daphne/server.py] [lineno 288] [priority ERROR] [funcname application_checker] [request_path -] [request_method -] [request_data -] [request_user -] [request_stack -] Exception inside application: Lock is not acquired.
Traceback (most recent call last):
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 435, in receive
    real_channel
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 484, in receive_single
    await self.receive_clean_locks.acquire(channel_key)
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 152, in acquire
    return await self.locks[channel].acquire()
  File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 176, in acquire
    yield from fut
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/sessions.py", line 183, in __call__
    return await self.inner(receive, self.send)
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/middleware.py", line 41, in coroutine_call
    await inner_instance(receive, send)
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/consumer.py", line 59, in __call__
    [receive, self.channel_receive], self.dispatch
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/utils.py", line 58, in await_many_dispatch
    await task
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 447, in receive
    self.receive_lock.release()
  File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 201, in release
    raise RuntimeError('Lock is not acquired.')
RuntimeError: Lock is not acquired.
4

1 回答 1

3

首先,让我们看看RuntimeError: Lock is not acquired.错误的来源。正如回溯所给出的,release()文件/opt/python3.6/lib/python3.6/asyncio/locks.py中的方法定义如下:

    def release(self):
        """Release a lock.

        When the lock is locked, reset it to unlocked, and return.
        If any other coroutines are blocked waiting for the lock to become
        unlocked, allow exactly one of them to proceed.

        When invoked on an unlocked lock, a RuntimeError is raised.

        There is no return value.
        """
        if self._locked:
            self._locked = False
            self._wake_up_first()
        else:
            raise RuntimeError('Lock is not acquired.')

原语锁是一种同步原语,在锁定时不属于特定线程。

release()当试图通过调用该方法来释放未锁定的锁时,RuntimeError将引发 ,因为该方法只能在锁定状态下调用。在锁定状态下调用时,状态变为未锁定。

acquire()现在对于同一文件中方法中引发的先前错误,该acquire()方法定义如下:

    async def acquire(self):
        """Acquire a lock.

        This method blocks until the lock is unlocked, then sets it to
        locked and returns True.
        """
        if (not self._locked and (self._waiters is None or
                all(w.cancelled() for w in self._waiters))):
            self._locked = True
            return True

        if self._waiters is None:
            self._waiters = collections.deque()
        fut = self._loop.create_future()
        self._waiters.append(fut)

        # Finally block should be called before the CancelledError
        # handling as we don't want CancelledError to call
        # _wake_up_first() and attempt to wake up itself.
        try:
            try:
                await fut
            finally:
                self._waiters.remove(fut)
        except exceptions.CancelledError:
            if not self._locked:
                self._wake_up_first()
            raise

        self._locked = True
        return True

因此,为了让concurrent.futures._base.CancelledError您提出错误,await fut一定是导致问题的原因。

要修复它,您可以查看Awaiting an asyncio.Future raises concurrent.futures._base.CancelledError 而不是等待设置值/异常

基本上,您的代码中可能有一个您没有等待的可等待对象,并且通过不等待它,您永远不会将控制权交还给事件循环或存储可等待对象,导致它立即被清理,完全取消它(并且它控制的所有可等待对象)

只需确保您在代码中等待等待结果的结果,找到您错过的任何内容。

于 2021-06-27T13:30:13.403 回答