tensorflow - 加载时间慢 - EfficientDet D2

Question

我正在使用 Jetson AGX Xavier加载 Tensorflow 2 版本的 EfficientDet D2 ( http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d2_coco17_tpu-32.tar.gz )。

我运行以下脚本：

#!/usr/bin/python3
import tensorflow as tf
import time
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils

PATH_TO_SAVED_MODEL = "./efficientdet_d2_coco17_tpu-32/saved_model/"

print('Loading model...')
start_time = time.time()

# Load saved model and build the detection function
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)

end_time = time.time()
elapsed_time = end_time - start_time
print('Done! Took {} seconds'.format(elapsed_time))

但是，性能结果是加载时间超过 13 分钟。这是命令执行后的输出：

./test.py
2021-07-04 10:58:58.074413: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 10:59:05.375568: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
Loading model...
2021-07-04 11:00:54.337115: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-07-04 11:00:54.342226: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-04 11:00:54.347726: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.347959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.17GiB deviceMemoryBandwidth: 82.08GiB/s
2021-07-04 11:00:54.348037: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 11:00:54.353788: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-07-04 11:00:54.354040: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-07-04 11:00:54.358471: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-04 11:00:54.359514: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-04 11:00:54.364904: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-04 11:00:54.369140: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-07-04 11:00:54.369861: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-04 11:00:54.370262: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.370843: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.371060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-07-04 11:00:54.375404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.375623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.17GiB deviceMemoryBandwidth: 82.08GiB/s
2021-07-04 11:00:54.375714: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 11:00:54.375823: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-07-04 11:00:54.375908: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-07-04 11:00:54.376011: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-04 11:00:54.376090: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-04 11:00:54.376167: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-04 11:00:54.376287: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-07-04 11:00:54.376369: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-04 11:00:54.376673: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.376972: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.377093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-07-04 11:05:01.847060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-04 11:05:01.847174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-07-04 11:05:01.847226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-07-04 11:05:01.847710: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.848589: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.848911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.849096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19271 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-07-04 11:05:01.850298: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Done! Took 793.8719098567963 seconds

凭借 Xavier 的计算能力，我会期待更好的性能吗？有谁知道这可能是什么原因？

感谢您的任何帮助或意见！

score 0 · Accepted Answer

所花费的时间不仅仅是加载模型，而是初始化设备。也许问题出在驱动程序上。为了证明这一点，尝试初始化一个更小的模型，或者像 a+b=c 这样的玩具示例。我预计这将需要类似的时间。

此外，计算能力与加载模型无关。模型的加载更多地取决于驱动程序和 TF 的内存管理。内存中模型的实际构建可能在 CPU 上完成，即使使用 GPU 或其他加速器（只是猜测）。

我对 CUDA 和 TF 的体验是使用一个版本的 CUDA、TF 和 GPU 驱动程序初始化时间为 5 分钟。在同一硬件（8x1080ti GPU）上使用另一个版本的 CUDA 和 TF 不到 30 秒。

tensorflow - 加载时间慢 - EfficientDet D2

1 回答 1

Related

Reference