1

我正在尝试按照此 [链接][1] 中提到的说明将ssdLite_mobilenet_V2TensorFlow 从 TensorFlow 转换为tensorrt使用。tf_trt我收到Aborted (core dumped)错误。真正奇怪的是,我在相同的图形架构上做了完全相同的事情(使用相同的程序),但在另一个集合上进行了训练,并且它运行时没有错误。

操作系统:Ubuntu 18.04.2 GPU:Tesla M60 TensorFlow 1.13.1

我尝试修改 max_batch_size 和 max_workspace_size_bytes。但问题似乎不是来自 GPU 内存溢出,它似乎从来没有使用超过 1.5G 的内存。

import tensorflow.contrib.tensorrt as trt
import tensorflow as tf

frozen_graph, input_names, output_names = build_detection_graph(
    config="pipeline.config",
    checkpoint="model.ckpt-75000"
)
with tf.gfile.FastGFile('graph.pb', 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

with open("graph.uff","wb") as f:
    f.write(uff_model.SerializeToString())```

2019-04-18 12:45:50.313642: I tensorflow/contrib/tensorrt/segment/segment.cc:443] There are 169 ops of 35 different types in the graph that are not converted to TensorRT: Range, GreaterEqual, Greater, Split, TopKV2, Select, Less, Slice, Identity, BiasAdd, Reshape, Mul, Fill, Squeeze, Const, Unpack, ResizeBilinear, GatherV2, NonMaxSuppressionV3, Where, ExpandDims, Cast, Minimum, Sum, Sub, Pack, Transpose, Pad, ConcatV2, Exp, Placeholder, Add, Shape, NoOp, StridedSlice, (For more information see https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops).
2019-04-18 12:45:51.094322: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:913] Number of TensorRT candidate segments: 2
2019-04-18 12:45:51.146102: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2019-04-18 12:46:15.758417: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 275 nodes succeeded.
2019-04-18 12:46:15.801363: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2019-04-18 12:47:02.994309: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 684 nodes succeeded.
2019-04-18 12:47:03.494635: F tensorflow/core/graph/graph.cc:659] Check failed: inputs[edge->dst_input()] == nullptr Edge {name:'TRTEngineOp_1' id:1323 op device:{} def:{{{node TRTEngineOp_1}} = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,300,300,3]], max_cached_engines_count=10, output_shapes=[[1,576,19,19], [1,1280,10,10], [1,512,5,5], [1,256,3,3], [1,24,3,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_1_native_segment", serialized_segment="\310\265\2...00\000\000", static_engine=true, use_calibration=false, workspace_size_bytes=11966231, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Preprocessor/stack, ^const6)}}:{name:'TRTEngineOp_0' id:1322 op device:{} def:{{{node TRTEngineOp_0}} = TRTEngineOp[InT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,256,3,3], [1,512,5,5], [1,1280,10,10], [1,576,19,19], [1,24,3,3]], max_cached_engines_count=10, output_shapes=[[1,1917,4], [1,1917,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_0_native_segment", serialized_segment="\360o\021\...00\000\000", static_engine=true, use_calibration=false, workspace_size_bytes=4810985, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/Relu6, FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/Relu6, FeatureExtractor/MobilenetV2/Conv_1/Relu6, FeatureExtractor/MobilenetV2/expanded_conv_13/expansion_output, BoxPredictor_3/BoxEncodingPredictor/BiasAdd, ^Postprocessor/scale_logits/y, ^BoxPredictor_4/BoxEncodingPredictor/biases/read, ^BoxPredictor_5/BoxEncodingPredictor/biases/read, ^const6)}} with dst_input 0 and had pre-existing input edge {name:'TRTEngineOp_1' id:1323 op device:{} def:{{{node TRTEngineOp_1}} = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,300,300,3]], max_cached_engines_count=10, output_shapes=[[1,576,19,19], [1,1280,10,10], [1,512,5,5], [1,256,3,3], [1,24,3,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_1_native_segment", serialized_segment="\310\265\2...00\000\000", static_engine=true, use_calibration=false, workspace_size_bytes=11966231, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Preprocessor/stack, ^const6)}}:{name:'TRTEngineOp_0' id:1322 op device:{} def:{{{node TRTEngineOp_0}} = TRTEngineOp[InT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,256,3,3], [1,512,5,5], [1,1280,10,10], [1,576,19,19], [1,24,3,3]], max_cached_engines_count=10, output_shapes=[[1,1917,4], [1,1917,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_0_native_segment", serialized_segment="\360o\021\...00\000\000", static_engine=true, use_calibration=false, workspace_size_bytes=4810985, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/Relu6, FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/Relu6, FeatureExtractor/MobilenetV2/Conv_1/Relu6, FeatureExtractor/MobilenetV2/expanded_conv_13/expansion_output, BoxPredictor_3/BoxEncodingPredictor/BiasAdd, ^Postprocessor/scale_logits/y, ^BoxPredictor_4/BoxEncodingPredictor/biases/read, ^BoxPredictor_5/BoxEncodingPredictor/biases/read, ^const6)}}
Aborted (core dumped)





  [1]: https://github.com/NVIDIA-AI-IOT/tf_trt_models
4

1 回答 1

0

create_inference_graph能用这个参数重试调用吗is_dynamic_op=True

此外,使用增加张量流日志的详细程度也会很好TF_CPP_VMODULE=convert_graph=2,convert_nodes=2,segment=2,trt_engine=2 python ...

还要检查最新的张量流。您可以尝试 dockerhub 的夜间容器。

于 2019-04-20T06:12:21.410 回答