描述
环境
TensorRT 版本:8.2.3.0 NVIDIA GPU:gtx 1080ti NVIDIA 驱动程序版本:470.103.01 CUDA 版本:11.4 CUDNN 版本:8.2 操作系统:Linux 18.06 Python 版本(如果适用):3.8.0 Tensorflow 版本(如果适用): PyTorch 版本(如果适用):1.10 裸机或容器(如果有,版本):
grpc服务器代码
server = grpc.server(
futures.ThreadPoolExecutor(),
options=[
("grpc.max_send_message_length", -1),
("grpc.max_receive_message_length", -1),
("grpc.so_reuseport", 1),
("grpc.use_local_subchannel_pool", 1),
],
)
grpc 存根初始化
grpcObject(encoder=trt_model, decoder=decoder)
trt_model 初始化代码
def __init__(self):
cuda_ctx = cuda.Device(0).make_context()
self.cuda_ctx = cuda_ctx
if self.cuda_ctx:
self.cuda_ctx.push()
...
你好。我正在通过 grpc 使用 TensorRT。但是在grpc的多线程函数中设置max_worker后,当多个客户端进来请求时,会出现如下错误。在 max_worker=1 的情况下,不会发生错误。你能帮我吗?
推断方法
def infer(self, wav_path):
input_signal = preprocess_stt(wav_path)
if self.cuda_ctx:
self.cuda_ctx.push()
self.context.set_binding_shape(0, input_signal.shape)
assert self.context.all_binding_shapes_specified
h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
h_input_signal = cuda.register_host_memory(np.ascontiguousarray(to_numpy(input_signal)))
cuda.memcpy_htod_async(self.d_input, h_input_signal, self.stream)
self.context.execute_async(bindings=[int(self.d_input), int(self.d_output)], stream_handle=self.stream.handle)
cuda.memcpy_dtoh_async(h_output, self.d_output, self.stream)
self.stream.synchronize()
if self.cuda_ctx:
self.cuda_ctx.pop()
return h_output
错误
pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered
E0228 17:02:30.063214 140249774667520 _server.py:471] Exception iterating responses: cuMemHostAlloc failed: an illegal memory access was encountered
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/grpc/_server.py", line 461, in _take_response_from_response_iterator
return next(response_iterator), True
File "/data/grpc/stt_grpc/grpc_class/dummy_grpc_core.py", line 116, in getStream
stt_result = trt_inference(self.trt_model, 'aaa.wav', self.decoder)
File "/data/grpc/stt_grpc/stt_package/stt_func.py", line 525, in trt_inference
model_output = actor.infer('aaa.wav')
File "/data/grpc/stt_grpc/grpc_class/tensorrt_stt.py", line 153, in infer
h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered