tensorflow - Google Cloud ML 上的 Keras 似乎没有使用 GPU？有可能让它工作吗？

Question

我尝试在 cloud ml（谷歌云平台）上运行带有 tensorflow 后端的 Keras。我发现keras似乎没有使用GPU。在我的 CPU 上运行一个 epoch 的性能是 190 秒，与我在转储的日志中看到的相同。有没有办法识别代码是在 GPU 中运行还是在 keras 中的 CPU 中运行？有没有人尝试过在 Cloud ML 上运行 Tensor 流后端的 Keras？

score 3 · Accepted Answer

更新：截至 2017 年 3 月，GPU 已公开发布。见刘富阳的回答

~~GPU 目前在 CloudML 上不可用。但是，它们将在未来几个月内出现。~~

score 3 · Accepted Answer

是的，现在支持。

基本上你需要cloudml-gpu.yaml在你的模块中添加一个文件，其内容如下：

trainingInput:
  scaleTier: CUSTOM
  # standard_gpu provides 1 GPU. Change to complex_model_m_gpu for 4 
GPUs
  masterType: standard_gpu
  runtimeVersion: "1.0"

然后添加一个名为的选项--config=trainer/cloudml-gpu.yaml（假设您的培训模块位于名为的文件夹中trainer）。例如：

export BUCKET_NAME=tf-learn-simple-sentiment
export JOB_NAME="example_5_train_$(date +%Y%m%d_%H%M%S)"
export JOB_DIR=gs://$BUCKET_NAME/$JOB_NAME
export REGION=europe-west1

gcloud ml-engine jobs submit training $JOB_NAME \
  --job-dir gs://$BUCKET_NAME/$JOB_NAME \
  --runtime-version 1.0 \
  --module-name trainer.example5-keras \
  --package-path ./trainer \
  --region $REGION \
  --config=trainer/cloudml-gpu.yaml \
  -- \
  --train-file gs://tf-learn-simple-sentiment/sentiment_set.pickle

您可能还想查看此 URL以了解 GPU 可用区域和其他信息。

score 1 · Accepted Answer

import keras.backend.tensorflow_backend as K
K._set_session(K.tf.Session(config=K.tf.ConfigProto(log_device_placement=True)))

应该让 keras 将每个张量的设备位置打印到 stdout 或 stderr。

tensorflow - Google Cloud ML 上的 Keras 似乎没有使用 GPU？有可能让它工作吗？

3 回答 3

Related

Reference