2

我正在测试 Google Cloud ML 以使用 Tensorflow 加速我的 ML 模型。

不幸的是,Google Cloud ML 似乎非常慢。我的主流级 PC 至少比 Google Cloud ML 快 10 倍。

我怀疑它使用GPU,所以我做了一个测试。我修改了一个示例代码以强制使用 GPU。

diff --git a/mnist/trainable/trainer/task.py b/mnist/trainable/trainer/task.py
index 9acb349..a64a11d 100644
--- a/mnist/trainable/trainer/task.py
+++ b/mnist/trainable/trainer/task.py
@@ -131,11 +131,12 @@ def run_training():
     images_placeholder, labels_placeholder = placeholder_inputs(
     FLAGS.batch_size)

-    # Build a Graph that computes predictions from the inference model.
-    logits = mnist.inference(images_placeholder, FLAGS.hidden1, FLAGS.hidden2)
+    with tf.device("/gpu:0"):
+      # Build a Graph that computes predictions from the inference model.
+      logits = mnist.inference(images_placeholder, FLAGS.hidden1, FLAGS.hidden2)

-    # Add to the Graph the Ops for loss calculation.
-    loss = mnist.loss(logits, labels_placeholder)
+      # Add to the Graph the Ops for loss calculation.
+      loss = mnist.loss(logits, labels_placeholder)

     # Add to the Graph the Ops that calculate and apply gradients.
     train_op = mnist.training(loss, FLAGS.learning_rate)

此培训代码适用于我的 PC ( gcloud beta ml local train ...),但不适用于云端。它给出了这样的错误:

 "Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 239, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 235, in main
    run_training()
  File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 177, in run_training
    sess.run(init)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
InvalidArgumentError: Cannot assign a device to node 'softmax_linear/biases': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
Colocation Debug Info:
Colocation group had the following types and devices:
ApplyGradientDescent: CPU
Identity: CPU
Assign: CPU
Variable: CPU
     [[Node: softmax_linear/biases = Variable[container="", dtype=DT_FLOAT, shape=[10], shared_name="", _device="/device:GPU:0"]()]]

Google Cloud ML 是否支持 GPU?

4

1 回答 1

3

GPU 现在处于 Beta 版,所有 Cloud ML 客户都可以访问。

以下是在Cloud ML 中使用 GPU的文档。

于 2017-02-02T16:47:13.673 回答