python - 多次创建新的 TensorFlow 设备

Question

我正在尝试在服务器上运行 tensorflow-deeplab-v3 模型来分割我发送的图像。一切正常，但问题是每次我发送图像时，模型都会查找 GPU 并创建一个新的 GPU 设备，而我发送的每个图像的设备创建过程大约需要 10 秒。如何防止模型每次都创建设备而只使用以前创建的设备？

我试图设置 CUDA_VISIBLE_DEVICES 但同样的结果。我还尝试创建一个设备并使用该设备运行我的代码，但结果还是一样。

我在 Amazon p2.xlarge EC2 实例上运行我的服务器。操作系统信息是：

Distributor ID: Ubuntu
Description:    Ubuntu 16.04.6 LTS
Release:    16.04
Codename:   xenial

英伟达 smi 输出：

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04    Driver Version: 418.40.04    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   35C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvcc --version 输出：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

python 版本：3.5.2 点子版本：19.1.1 点子列表输出：

Package              Version        
-------------------- ---------------
absl-py              0.7.1          
astor                0.8.0          
bottle               0.12.16        
certifi              2019.3.9       
chardet              3.0.4          
cycler               0.10.0         
gast                 0.2.2          
get                  2019.4.13      
google-pasta         0.1.7          
grpcio               1.21.1         
h5py                 2.9.0          
idna                 2.8            
Keras-Applications   1.0.8          
Keras-Preprocessing  1.1.0          
kiwisolver           1.1.0          
Markdown             3.1.1          
matplotlib           3.0.3          
mock                 3.0.5          
numpy                1.16.4         
opencv-python        4.1.0.25       
Pillow               6.0.0          
pip                  19.1.1         
post                 2019.4.13      
protobuf             3.8.0          
public               2019.4.13      
pyparsing            2.4.0          
python-dateutil      2.8.0          
query-string         2019.4.13      
request              2019.4.13      
requests             2.22.0         
setuptools           41.0.1         
six                  1.12.0         
tb-nightly           1.14.0a20190614
tensorboard          1.14.0         
tensorflow-estimator 1.14.0         
tensorflow-gpu       1.14.0         
termcolor            1.1.0          
urllib3              1.25.3         
Werkzeug             0.15.4         
wheel                0.33.4         
wrapt                1.11.2

第一个请求之后的请求输出：

78.181.181.107 - - [23/Jun/2019 11:18:20] "GET / HTTP/1.1" 200 0
Request arrived.
...
Writing output masks...
2019-06-23 11:22:42.036040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.036423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2019-06-23 11:22:42.036502: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-23 11:22:42.036540: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-23 11:22:42.036572: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-23 11:22:42.036604: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-23 11:22:42.036637: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-23 11:22:42.036669: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-23 11:22:42.036702: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-23 11:22:42.036776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-23 11:22:42.037430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-23 11:22:42.037448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-06-23 11:22:42.037465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-06-23 11:22:42.037643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.037953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-23 11:22:42.038233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Preparing paths...
Paths ready. (2.3365020751953125e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Generated. (9.5367431640625e-07)
Prediction took: 11.09858751296997
Cropping /home/ubuntu/Cabin/ModelOutput/test_front_mask.png
Cropped and wrote to file. (0.06068730354309082)
Preparing paths...
Paths ready. (2.4557113647460938e-05)
generating: /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Generated. (0.0004572868347167969)
Prediction took: 0.47649669647216797
Cropping /home/ubuntu/Cabin/ModelOutput/test_side_mask.png
Cropped and wrote to file. (0.06105923652648926)
Collecting trashes...
All clear! (0.000209808349609375)
Evaluation complete. (11.765886068344116)
Measuring...
Measuring complete. (1.4767637252807617)
78.181.181.107 - - [23/Jun/2019 11:22:48] "GET / HTTP/1.1" 200 0

我将推理脚本嵌入到我自己的用于运行服务器的脚本中，如下所示（这里我从源下载图像以进行测试，脚本尚未完全完成）。它在第 161 行创建 GPU 设备，同时输入 'for pred_dict, image_path in zipped:' 循环：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import time
import argparse
import os
import glob
from io import BytesIO

import tensorflow as tf
import cv2

import DeepLab.tensorflow_deeplab_v3_plus.deeplab_model as deeplab_model
from DeepLab.tensorflow_deeplab_v3_plus.utils import preprocessing
from DeepLab.tensorflow_deeplab_v3_plus.utils import dataset_util

from PIL import Image
#import matplotlib.pyplot as plt

from tensorflow.python import debug as tf_debug

from bottle import run, post, request, route
import requests

import Cropper
import Measure


...


# Using the Winograd non-fused algorithms provides a small performance boost.
os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'

pred_hooks = None
if FLAGS.debug:
    debug_hook = tf_debug.LocalCLIDebugHook()
    pred_hooks = [debug_hook]

print("Searching for gpus...")
start = time.time()
gpus = tf.config.experimental.list_physical_devices('GPU')
end = time.time()
print("Found all gpus. ("+ str(end-start) + ")")

print("Generating model...")
start = time.time()
model = tf.estimator.Estimator(
    model_fn=deeplab_model.deeplabv3_plus_model_fn,
    model_dir=FLAGS.model_dir,
    params={
      'output_stride': FLAGS.output_stride,
      'batch_size': 1,  # Batch size must be 1 because the images' size may differ
      'base_architecture': FLAGS.base_architecture,
      'pre_trained_model': None,
      'batch_norm_decay': None,
      'num_classes': _NUM_CLASSES,
    })
end = time.time()
print("Model ready. ("+ str(end-start) + ")")

#print("Generating tensorflow session...")
#start = time.time()
#config = tf.ConfigProto()
#sess = tf.Session(config=config)
#end = time.time()
#print("Session created. ("+ str(end-start) + ")")

def evaluate_model(image_list_dir, inference_path, data_path, model_path, model_output_path):
    print("Preparing list...")
    start = time.time()
    # This part reads looks at the Data folder and writes the name of all files in there into sample_images_list.txt
    imageList = open(image_list_dir, "w")
    for file in os.listdir(data_path):
        imageList.write(str(file)+"\n")
    imageList.close()
    end = time.time()
    print("List generated ("+ str(end-start) + ")")

    print("Loading images...")
    start = time.time()
    # This part runs the model for the current data
    examples = dataset_util.read_examples_list(FLAGS.infer_data_list)
    image_files = [os.path.join(FLAGS.data_dir, filename) for filename in examples]
    end = time.time()
    print("Images loaded ("+ str(end-start) + ")")

    with tf.device("/job:localhost/replica:0/task:0/device:GPU:0"):
        print("Inside device")
        print("Predicting...")
        start = time.time()
        predictions = model.predict(
            input_fn=lambda: preprocessing.eval_input_fn(image_files),
            hooks=pred_hooks)
        end = time.time()
        print("Predictions completed. ("+ str(end-start) + ")")

        output_dir = FLAGS.output_dir
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

        print("Calling zip function...")
        start = time.time()
        zipped = zip(predictions, image_files)
        end = time.time()
        print("Zip() complete. (" + str(end-start) + ")")

        print("Zipped: " + str(zipped))

        print("Writing output masks...")
        predictionTimeStart = time.time()

        for pred_dict, image_path in zipped:
    #        print("pred_dict is: " + str(pred_dict))

            print("Preparing paths...")
            start = time.time()
            image_basename = os.path.splitext(os.path.basename(image_path))[0]
            output_filename = image_basename + '_mask.png'
            path_to_output = os.path.join(output_dir, output_filename)
            end = time.time()
            print("Paths ready. (" + str(end-start) + ")")

            print("generating:", path_to_output)
            start = time.time()
            mask = pred_dict['decoded_labels']
            end = time.time()
            print("Generated. ("+ str(end-start) + ")")

            # Use this part to also save mask
    #        tmp = Image.fromarray(mask)
    #        plt.axis('off')
    #        plt.imshow(tmp)
    #        plt.savefig(path_to_output, bbox_inches='tight')

            predictionTimeEnd = time.time()
            print("Prediction took: " + str(predictionTimeEnd - predictionTimeStart))

            print("Cropping " + path_to_output)
            start = time.time()
            Cropper.evaluate(path_to_output, cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY))
            end = time.time()
            print("Cropped and wrote to file. ("+ str(end-start) + ")")

            predictionTimeStart = time.time()

        print("Collecting trashes...")
        start = time.time()
        for file in glob.glob(data_path + "*"):
            os.remove(file)
        end = time.time()
        print("All clear! ("+ str(end-start) + ")")


@route('/')#@post('/')
def measure():
    print("Request arrived.")
    try:
        # parse input data
#        try:
#            data = request.json()
#        except:
#            raise ValueError
#
#        if data is None:
#            raise ValueError

        # extract and validate name
        try:
            id = "test"#data['id']
            front_image_url = "https://static1.squarespace.com/static/55b4a361e4b085d388b66c34/t/59709c1903596e8ea44b089e/1501482492586/"#data['front_image_url']
            side_image_url = "https://static1.squarespace.com/static/55b4a361e4b085d388b66c34/t/59709c1903596e8ea44b089e/1501482492586/"#data['side_image_url']
            height = 173#data['height']
            angle = 0#data['angle']
        except (TypeError, KeyError):
            raise ValueError

    except KeyError:
        # if name already exists, return 409 Conflict
        response.status = 409
        return

    try:
        print("Downloading images...")
        start = time.time()
        downloaded_front_image = requests.get(front_image_url)
        downloaded_side_image = requests.get(side_image_url)
        end = time.time()
        print("Download complete. ("+ str(end-start) + ")")
    except(FileNotFoundError, PermissionError, TimeoutError):
        raise ValueError

    print("Preparing images...")
    start = time.time()
    front_image = Image.open(BytesIO(downloaded_front_image.content))
    side_image = Image.open(BytesIO(downloaded_side_image.content))
    end = time.time()
    print("Images ready. ("+ str(end-start) + ")")

    print("Saving images...")
    start = time.time()
    front_image_name = data_path + str(id) + '_front.jpg'
    side_image_name = data_path + str(id) + '_side.jpg'

    front_image.save(front_image_name)
    side_image.save(side_image_name)
    end = time.time()
    print("Images saved. ("+ str(end-start) + ")")

    print("Evaluating model...")
    modelstart = time.time()
    evaluate_model(image_list_dir, inference_path, data_path, model_path, model_output_path)
    modelend = time.time()
    print("Evaluation complete. ("+ str(modelend-modelstart) + ")")

    print("Measuring...")
    start = time.time()
    Measure.evaluate(model_output_path + str(id) + "_front_mask_cropped.png", model_output_path + str(id) + "_side_mask_cropped.png", height, angle, id)
    end = time.time()
    print("Measuring complete. (" + str(end-start) + ")")

    pass

run(host=FLAGS.private_ip, port=FLAGS.port)

我想最小化输出时间，所以我希望能够创建一次设备，然后对其他图像使用相同的设备。

python - 多次创建新的 TensorFlow 设备

0 回答 0

Related

Reference