我正在尝试在服务器上运行 tensorflow-deeplab-v3 模型来分割我发送的图像。一切正常,但问题是每次我发送图像时,模型都会查找 GPU 并创建一个新的 GPU 设备,而我发送的每个图像的设备创建过程大约需要 10 秒。如何防止模型每次都创建设备而只使用以前创建的设备?
我试图设置 CUDA_VISIBLE_DEVICES 但同样的结果。我还尝试创建一个设备并使用该设备运行我的代码,但结果还是一样。
我在 Amazon p2.xlarge EC2 实例上运行我的服务器。操作系统信息是:
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial
英伟达 smi 输出:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 35C P8 28W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
nvcc --version 输出:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
python 版本:3.5.2 点子版本:19.1.1 点子列表输出:
Package Version
-------------------- ---------------
absl-py 0.7.1
astor 0.8.0
bottle 0.12.16
certifi 2019.3.9
chardet 3.0.4
cycler 0.10.0
gast 0.2.2
get 2019.4.13
google-pasta 0.1.7
grpcio 1.21.1
h5py 2.9.0
idna 2.8
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
kiwisolver 1.1.0
Markdown 3.1.1
matplotlib 3.0.3
mock 3.0.5
numpy 1.16.4
opencv-python 4.1.0.25
Pillow 6.0.0
pip 19.1.1
post 2019.4.13
protobuf 3.8.0
public 2019.4.13
pyparsing 2.4.0
python-dateutil 2.8.0
query-string 2019.4.13
request 2019.4.13
requests 2.22.0
setuptools 41.0.1
six 1.12.0
tb-nightly 1.14.0a20190614
tensorboard 1.14.0
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
termcolor 1.1.0
urllib3 1.25.3
Werkzeug 0.15.4
wheel 0.33.4
wrapt 1.11.2
第一个请求之后的请求输出:
78.181.181.107 - - [23/Jun/2019 11:18:20] "GET / HTTP/1.1" 200 0
Request arrived.
...
Writing output masks...
2019-06-23 11:22:42.036040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.036423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-06-23 11:22:42.036502: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-23 11:22:42.036540: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-23 11:22:42.036572: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-23 11:22:42.036604: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-23 11:22:42.036637: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-23 11:22:42.036669: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-23 11:22:42.036702: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-23 11:22:42.036776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-23 11:22:42.037430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-23 11:22:42.037448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-06-23 11:22:42.037465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-06-23 11:22:42.037643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.038233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Preparing paths...
Paths ready. (2.3365020751953125e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Generated. (9.5367431640625e-07)
Prediction took: 11.09858751296997
Cropping /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Cropped and wrote to file. (0.06068730354309082)
Preparing paths...
Paths ready. (2.4557113647460938e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Generated. (0.0004572868347167969)
Prediction took: 0.47649669647216797
Cropping /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Cropped and wrote to file. (0.06105923652648926)
Collecting trashes...
All clear! (0.000209808349609375)
Evaluation complete. (11.765886068344116)
Measuring...
Measuring complete. (1.4767637252807617)
78.181.181.107 - - [23/Jun/2019 11:22:48] "GET / HTTP/1.1" 200 0
我将推理脚本嵌入到我自己的用于运行服务器的脚本中,如下所示(这里我从源下载图像以进行测试,脚本尚未完全完成)。它在第 161 行创建 GPU 设备,同时输入 'for pred_dict, image_path in zipped:' 循环:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import time
import argparse
import os
import glob
from io import BytesIO
import tensorflow as tf
import cv2
import DeepLab.tensorflow_deeplab_v3_plus.deeplab_model as deeplab_model
from DeepLab.tensorflow_deeplab_v3_plus.utils import preprocessing
from DeepLab.tensorflow_deeplab_v3_plus.utils import dataset_util
from PIL import Image
#import matplotlib.pyplot as plt
from tensorflow.python import debug as tf_debug
from bottle import run, post, request, route
import requests
import Cropper
import Measure
...
# Using the Winograd non-fused algorithms provides a small performance boost.
os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
pred_hooks = None
if FLAGS.debug:
debug_hook = tf_debug.LocalCLIDebugHook()
pred_hooks = [debug_hook]
print("Searching for gpus...")
start = time.time()
gpus = tf.config.experimental.list_physical_devices('GPU')
end = time.time()
print("Found all gpus. ("+ str(end-start) + ")")
print("Generating model...")
start = time.time()
model = tf.estimator.Estimator(
model_fn=deeplab_model.deeplabv3_plus_model_fn,
model_dir=FLAGS.model_dir,
params={
'output_stride': FLAGS.output_stride,
'batch_size': 1, # Batch size must be 1 because the images' size may differ
'base_architecture': FLAGS.base_architecture,
'pre_trained_model': None,
'batch_norm_decay': None,
'num_classes': _NUM_CLASSES,
})
end = time.time()
print("Model ready. ("+ str(end-start) + ")")
#print("Generating tensorflow session...")
#start = time.time()
#config = tf.ConfigProto()
#sess = tf.Session(config=config)
#end = time.time()
#print("Session created. ("+ str(end-start) + ")")
def evaluate_model(image_list_dir, inference_path, data_path, model_path, model_output_path):
print("Preparing list...")
start = time.time()
# This part reads looks at the Data folder and writes the name of all files in there into sample_images_list.txt
imageList = open(image_list_dir, "w")
for file in os.listdir(data_path):
imageList.write(str(file)+"\n")
imageList.close()
end = time.time()
print("List generated ("+ str(end-start) + ")")
print("Loading images...")
start = time.time()
# This part runs the model for the current data
examples = dataset_util.read_examples_list(FLAGS.infer_data_list)
image_files = [os.path.join(FLAGS.data_dir, filename) for filename in examples]
end = time.time()
print("Images loaded ("+ str(end-start) + ")")
with tf.device("/job:localhost/replica:0/task:0/device:GPU:0"):
print("Inside device")
print("Predicting...")
start = time.time()
predictions = model.predict(
input_fn=lambda: preprocessing.eval_input_fn(image_files),
hooks=pred_hooks)
end = time.time()
print("Predictions completed. ("+ str(end-start) + ")")
output_dir = FLAGS.output_dir
if not os.path.exists(output_dir):
os.makedirs(output_dir)
print("Calling zip function...")
start = time.time()
zipped = zip(predictions, image_files)
end = time.time()
print("Zip() complete. (" + str(end-start) + ")")
print("Zipped: " + str(zipped))
print("Writing output masks...")
predictionTimeStart = time.time()
for pred_dict, image_path in zipped:
# print("pred_dict is: " + str(pred_dict))
print("Preparing paths...")
start = time.time()
image_basename = os.path.splitext(os.path.basename(image_path))[0]
output_filename = image_basename + '_mask.png'
path_to_output = os.path.join(output_dir, output_filename)
end = time.time()
print("Paths ready. (" + str(end-start) + ")")
print("generating:", path_to_output)
start = time.time()
mask = pred_dict['decoded_labels']
end = time.time()
print("Generated. ("+ str(end-start) + ")")
# Use this part to also save mask
# tmp = Image.fromarray(mask)
# plt.axis('off')
# plt.imshow(tmp)
# plt.savefig(path_to_output, bbox_inches='tight')
predictionTimeEnd = time.time()
print("Prediction took: " + str(predictionTimeEnd - predictionTimeStart))
print("Cropping " + path_to_output)
start = time.time()
Cropper.evaluate(path_to_output, cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY))
end = time.time()
print("Cropped and wrote to file. ("+ str(end-start) + ")")
predictionTimeStart = time.time()
print("Collecting trashes...")
start = time.time()
for file in glob.glob(data_path + "*"):
os.remove(file)
end = time.time()
print("All clear! ("+ str(end-start) + ")")
@route('/')#@post('/')
def measure():
print("Request arrived.")
try:
# parse input data
# try:
# data = request.json()
# except:
# raise ValueError
#
# if data is None:
# raise ValueError
# extract and validate name
try:
id = "test"#data['id']
front_image_url = "https://static1.squarespace.com/static/55b4a361e4b085d388b66c34/t/59709c1903596e8ea44b089e/1501482492586/"#data['front_image_url']
side_image_url = "https://static1.squarespace.com/static/55b4a361e4b085d388b66c34/t/59709c1903596e8ea44b089e/1501482492586/"#data['side_image_url']
height = 173#data['height']
angle = 0#data['angle']
except (TypeError, KeyError):
raise ValueError
except KeyError:
# if name already exists, return 409 Conflict
response.status = 409
return
try:
print("Downloading images...")
start = time.time()
downloaded_front_image = requests.get(front_image_url)
downloaded_side_image = requests.get(side_image_url)
end = time.time()
print("Download complete. ("+ str(end-start) + ")")
except(FileNotFoundError, PermissionError, TimeoutError):
raise ValueError
print("Preparing images...")
start = time.time()
front_image = Image.open(BytesIO(downloaded_front_image.content))
side_image = Image.open(BytesIO(downloaded_side_image.content))
end = time.time()
print("Images ready. ("+ str(end-start) + ")")
print("Saving images...")
start = time.time()
front_image_name = data_path + str(id) + '_front.jpg'
side_image_name = data_path + str(id) + '_side.jpg'
front_image.save(front_image_name)
side_image.save(side_image_name)
end = time.time()
print("Images saved. ("+ str(end-start) + ")")
print("Evaluating model...")
modelstart = time.time()
evaluate_model(image_list_dir, inference_path, data_path, model_path, model_output_path)
modelend = time.time()
print("Evaluation complete. ("+ str(modelend-modelstart) + ")")
print("Measuring...")
start = time.time()
Measure.evaluate(model_output_path + str(id) + "_front_mask_cropped.png", model_output_path + str(id) + "_side_mask_cropped.png", height, angle, id)
end = time.time()
print("Measuring complete. (" + str(end-start) + ")")
pass
run(host=FLAGS.private_ip, port=FLAGS.port)
我想最小化输出时间,所以我希望能够创建一次设备,然后对其他图像使用相同的设备。