0

描述

环境

TensorRT 版本:8.2.3.0 NVIDIA GPU:gtx 1080ti NVIDIA 驱动程序版本:470.103.01 CUDA 版本:11.4 CUDNN 版本:8.2 操作系统:Linux 18.06 Python 版本(如果适用):3.8.0 Tensorflow 版本(如果适用)PyTorch 版本(如果适用):1.10 裸机或容器(如果有,版本)

grpc服务器代码

server = grpc.server(
    futures.ThreadPoolExecutor(),
    options=[
        ("grpc.max_send_message_length", -1),
        ("grpc.max_receive_message_length", -1),
        ("grpc.so_reuseport", 1),
        ("grpc.use_local_subchannel_pool", 1),
    ],
)

grpc 存根初始化

grpcObject(encoder=trt_model, decoder=decoder)

trt_model 初始化代码

def __init__(self):
      cuda_ctx = cuda.Device(0).make_context()
      self.cuda_ctx = cuda_ctx
      if self.cuda_ctx:
          self.cuda_ctx.push() 
      ...

你好。我正在通过 grpc 使用 TensorRT。但是在grpc的多线程函数中设置max_worker后,当多个客户端进来请求时,会出现如下错误。在 max_worker=1 的情况下,不会发生错误。你能帮我吗?

推断方法

def infer(self, wav_path):

        input_signal = preprocess_stt(wav_path)

        if self.cuda_ctx:
            self.cuda_ctx.push()
        self.context.set_binding_shape(0, input_signal.shape)

        assert self.context.all_binding_shapes_specified
        h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)

        h_input_signal = cuda.register_host_memory(np.ascontiguousarray(to_numpy(input_signal)))
        cuda.memcpy_htod_async(self.d_input, h_input_signal, self.stream)
        self.context.execute_async(bindings=[int(self.d_input), int(self.d_output)], stream_handle=self.stream.handle)
        cuda.memcpy_dtoh_async(h_output, self.d_output, self.stream)
        self.stream.synchronize()

        if self.cuda_ctx:
            self.cuda_ctx.pop()
        return h_output

错误

pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered
E0228 17:02:30.063214 140249774667520 _server.py:471] Exception iterating responses: cuMemHostAlloc failed: an illegal memory access was encountered
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/grpc/_server.py", line 461, in _take_response_from_response_iterator
    return next(response_iterator), True
  File "/data/grpc/stt_grpc/grpc_class/dummy_grpc_core.py", line 116, in getStream
    stt_result = trt_inference(self.trt_model, 'aaa.wav', self.decoder)
  File "/data/grpc/stt_grpc/stt_package/stt_func.py", line 525, in trt_inference
    model_output = actor.infer('aaa.wav')
  File "/data/grpc/stt_grpc/grpc_class/tensorrt_stt.py", line 153, in infer
    h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered
4

0 回答 0