0

我已成功将pytorch模型转换为tensorrt引擎,但遇到以下错误do_inference

[TensorRT] VERBOSE: Allocated persistent device memory of size 9451520
[TensorRT] VERBOSE: QKV Clone
[TensorRT] VERBOSE: QKV Deser Start
[TensorRT] VERBOSE: QKV Deser done
[TensorRT] VERBOSE: QKV Clone done
[TensorRT] VERBOSE: Allocated activation device memory of size 19280896
[TensorRT] VERBOSE: Assigning persistent memory blocks for various profiles
Aborted (core dumped)

我的模型基于带有 qkv 注意操作的转换器,我使用 tensorrt 插件将其转换如下:

qkv2_plg_creator = plg_registry.get_plugin_creator("CustomQKVToContextPluginDynamic", "1", "")
pf_type = trt.PluginField("type_id", np.array([fp16_mode], np.int32), trt.PluginFieldType.INT32)
pf_hidden_size = trt.PluginField("hidden_size", np.array([768], np.int32), trt.PluginFieldType.INT32)
pf_num_heads = trt.PluginField("num_heads", np.array([12], np.int32), trt.PluginFieldType.INT32)
pf_has_mask = trt.PluginField("has_mask", np.array([0], np.int32), trt.PluginFieldType.INT32)
pfc = trt.PluginFieldCollection([pf_hidden_size, pf_num_heads, pf_has_mask, pf_type])
qkv2ctx_plug = qkv2_plg_creator.create_plugin("qkv2ctx", pfc)

...
...
...

qkv = network.add_fully_connected(embeddings, 2304, weights,bias).get_output(0)

qkv2ctx = network.add_plugin_v2([qkv], qkv2ctx_plug)
temporal_attention_sa = qkv2ctx.get_output(0)

一些详细信息如下:

docker container:   nvcr.io/nvidia/tensorrt:20.12-py3
in which:
tensorrt==7.2.2
cuda==11.1
python==3.8
pytorch==1.9.0
4

0 回答 0