multithreading - 带有grpc多线程错误的tensorRT，如何解决？

Question

描述

环境

TensorRT 版本：8.2.3.0 NVIDIA GPU：gtx 1080ti NVIDIA 驱动程序版本：470.103.01 CUDA 版本：11.4 CUDNN 版本：8.2 操作系统：Linux 18.06 Python 版本（如果适用）：3.8.0 Tensorflow 版本（如果适用）： PyTorch 版本（如果适用）：1.10 裸机或容器（如果有，版本）：

grpc服务器代码

server = grpc.server(
    futures.ThreadPoolExecutor(),
    options=[
        ("grpc.max_send_message_length", -1),
        ("grpc.max_receive_message_length", -1),
        ("grpc.so_reuseport", 1),
        ("grpc.use_local_subchannel_pool", 1),
    ],
)

grpc 存根初始化

grpcObject(encoder=trt_model, decoder=decoder)

trt_model 初始化代码

def __init__(self):
      cuda_ctx = cuda.Device(0).make_context()
      self.cuda_ctx = cuda_ctx
      if self.cuda_ctx:
          self.cuda_ctx.push() 
      ...

你好。我正在通过 grpc 使用 TensorRT。但是在grpc的多线程函数中设置max_worker后，当多个客户端进来请求时，会出现如下错误。在 max_worker=1 的情况下，不会发生错误。你能帮我吗？

推断方法

def infer(self, wav_path):

        input_signal = preprocess_stt(wav_path)

        if self.cuda_ctx:
            self.cuda_ctx.push()
        self.context.set_binding_shape(0, input_signal.shape)

        assert self.context.all_binding_shapes_specified
        h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)

        h_input_signal = cuda.register_host_memory(np.ascontiguousarray(to_numpy(input_signal)))
        cuda.memcpy_htod_async(self.d_input, h_input_signal, self.stream)
        self.context.execute_async(bindings=[int(self.d_input), int(self.d_output)], stream_handle=self.stream.handle)
        cuda.memcpy_dtoh_async(h_output, self.d_output, self.stream)
        self.stream.synchronize()

        if self.cuda_ctx:
            self.cuda_ctx.pop()
        return h_output

错误

pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered
E0228 17:02:30.063214 140249774667520 _server.py:471] Exception iterating responses: cuMemHostAlloc failed: an illegal memory access was encountered
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/grpc/_server.py", line 461, in _take_response_from_response_iterator
    return next(response_iterator), True
  File "/data/grpc/stt_grpc/grpc_class/dummy_grpc_core.py", line 116, in getStream
    stt_result = trt_inference(self.trt_model, 'aaa.wav', self.decoder)
  File "/data/grpc/stt_grpc/stt_package/stt_func.py", line 525, in trt_inference
    model_output = actor.infer('aaa.wav')
  File "/data/grpc/stt_grpc/grpc_class/tensorrt_stt.py", line 153, in infer
    h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered

multithreading - 带有grpc多线程错误的tensorRT，如何解决？

描述

环境

grpc服务器代码

grpc 存根初始化

trt_model 初始化代码

推断方法

错误

0 回答 0

Related

Reference