0

我在 python 代码中使用了 TensorRT。所以我使用 PyCUDA。在以下推理代码中,an illegal memory access was encountered发生在stream.synchronize()

def infer(engine, x, batch_size, context):  
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    img = np.array(x).ravel()
    np.copyto(inputs[0].host, 1.0 - img / 255.0)  
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)    
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.

    return [out.host for out in outputs]

有什么问题?

编辑:我的程序是 Tensorflow 和 TensorRT 代码的组合。错误仅在我运行时发生

self.graph = tf.get_default_graph()
self.persistent_sess = tf.Session(graph=self.graph, config=tf_config)

在运行 infer() 之前。如果我不运行上述两行,我没有问题。

4

1 回答 1

1

这里的问题是我有两个 python 代码。说 tensorrtcode.py 和 tensorflowcode.py。

tensorrtcode.py has只有张量代码。

def infer(engine, x, batch_size, context):  
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    img = np.array(x).ravel()
    np.copyto(inputs[0].host, 1.0 - img / 255.0)  
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)    
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.

    return [out.host for out in outputs]

def main():
    .....
    infer(......)
    .....

然后tensorflowcode.py has只有 tensorflow apis 并使用session.

self.graph = tf.get_default_graph()
self.persistent_sess = tf.Session(graph=self.graph, config=tf_config)

问题是当我需要将类从 tensorflow 接口到 tensorrt 类时,在 tensorrt 的 main 中声明 tensorflow 代码的类实例为

def main(): ..... t_flow_code=tensorflowclass() infer(......) .....

然后我有错误illegal memory access was encountered happened at stream.synchronize()

问题通过添加解决another session at tensorrt just before t_flow_code=tensorflowclass().

我不明白为什么我需要它,因为我有自己的会话可以在 tensorflow 类中执行。为什么我在 tensorrt 代码中的类接口之前需要另一个会话。

于 2019-06-06T13:58:02.860 回答