python - Pytorch 在 WEB_CONCURRENCY > 1 的 Starlette 中使用时非常慢并且使用大量 GPU 内存

Question

我正在尝试构建一个使用 Pytorch 模型的 API。但是，一旦我增加到WEB_CONCURRENCY1 以上，它会创建比预期更多的线程，并且速度会大大降低，即使发送单个请求也是如此。

示例代码：

api.sh

export WEB_CONCURRENCY=2

python api.py

api.py

from starlette.applications import Starlette
from starlette.responses import UJSONResponse
from starlette.middleware.gzip import GZipMiddleware
from mymodel import Model


model = Model()
app = Starlette(debug=False)
app.add_middleware(GZipMiddleware, minimum_size=1000)    


@app.route('/process', methods=['GET', 'POST', 'HEAD'])
async def add_styles(request):
    if request.method == 'GET':
        params = request.query_params
    elif request.method == 'POST':
        params = await request.json()
    elif request.method == 'HEAD':
        return UJSONResponse([], headers=response_header)

    print('===Request body===')
    print(params)

    model_output = model(params.get('data', [])) # It is very simplified. Inside there are 
                                                 # many things that are happening, which 
                                                 # involve file reading/writing 
                                                 # and spawning processes with `popen` that 
                                                 # do even more processing. But I don't 
                                                 # think that should be an issue here.

    return model_output


if __name__ == '__main__':
    uvicorn.run('api:app', host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

在WEB_CONCURRENCY=1api.sh 中，nvidia-smi运行时只看到 1 个 python 进程，模型使用 1.2GB 或 VRAM。请求大约需要 0.7 秒

在WEB_CONCURRENCY=2api.sh 中，可以看到超过 8 个 python 进程nvidia-smi，它们将使用超过 ~8GB 的 VRAM。如果幸运并且没有出现内存不足错误，那么单个请求最多可能需要 3 秒。

我正在使用 Python3.8

为什么 Pytorch 不使用预期的 2.4GB VRAM 时WEB_CONCURRENCY=2？为什么它会减速这么多？

score 0 · Accepted Answer

如果其他人偶然发现了这个问题，请使用 gunicorn。它使用单独的线程/进程，因此不会发生内部冲突。

因此，与其使用：运行它python api.py，不如使用：gunicorn -w 2 api:app -k uvicorn.workers.UvicornWorker

python - Pytorch 在 WEB_CONCURRENCY > 1 的 Starlette 中使用时非常慢并且使用大量 GPU 内存

1 回答 1

Related

Reference