google-dl-platform - 深度学习 VM 无法通过 UI 部署 - 找不到图像资源 - 图像 url 中的错字

Question

我尝试使用带有 GPU 的 TF2.0 部署市场解决方案 Deep Learning VM (Google Click to Deploy)。我通过 UI 来选择区域和其他实例选项。

然而，一旦我部署并进入部署管理器屏幕，我就会看到以下错误：

jupyterlab-eu-w-4c-vm: {"ResourceType":"compute.v1.instance","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"errors":[{"domain":"global","message":"Invalid value for field 'resource.disks[0].initializeParams.sourceImage': 'https://www.googleapis.com/compute/v1/projects/click-to-deploy-images/global/images/tf-2-0-cu100-experimental-20190821'. The referenced image resource cannot be found.","reason":"invalid"}],"message":"Invalid value for field 'resource.disks[0].initializeParams.sourceImage': 'https://www.googleapis.com/compute/v1/projects/click-to-deploy-images/global/images/tf-2-0-cu100-experimental-20190821'. The referenced image resource cannot be found.","statusMessage":"Bad Request","requestPath":"https://compute.googleapis.com/compute/v1/projects/jupyterlab-instance/zones/europe-west4-c/instances","httpMethod":"POST"}}

关键是在该 url 找不到图像资源：

https://www.googleapis.com/compute/v1/projects/click-to-deploy-images/global/images/tf-2-0-cu100-experimental-20190821

我在 cloud shell 上搜索了可用的图像：

@cloudshell:~ (jupyterlab-instance)$ gcloud compute images list --project click-to-deploy-images --no-standard-images --uri | grep tf-2-0-cu100
https://www.googleapis.com/compute/v1/projects/click-to-deploy-images/global/images/tf-2-0-cu100--experimental-20190821

请注意，URL 不同，与部署脚本尝试获取的内容相比，图像名称中有一个额外的“-”：

tf-2-0-cu100-experimental-20190821
tf-2-0-cu100--experimental-20190821

这看起来像是一个无意的错字。

但我的问题是，我该如何部署这个虚拟机？有没有办法可以在部署之前修改 UI 生成的部署脚本，或者我是否需要通过 CLI 进行整个部署以添加额外的“-”？

有没有办法让我提出这个问题来让别人修复错字？我认为这将阻止任何人尝试使用深度学习 VM 通过 UI 工具部署 TensorFlow 2 GPU 实例。

谢谢你的帮助。

score 1 · Accepted Answer

我遇到了同样的问题。该 VM 不会使用 TF 2.0 版本部署，因为启动映像 URL 看起来很乱。它与区域无关（我尝试在没有 GPU 的情况下部署并在不同的区域中，它不起作用）

一种解决方案是直接使用实例部署映像（参见文档1）

gcloud compute instances create $INSTANCE_NAME \
  --zone=$ZONE \
  --image-family=tf2-latest-gpu \(I used cpu the one but this one seems to fit)
  --image-project=deeplearning-platform-release \
  --accelerator=count=1,type=nvidia-tesla-k80

添加所需的任何选项（GPU 等）。

您可以通过以下方式获得该命令的帮助

gcloud compute instances create --help

要列出所有可用的图像，请使用

gcloud compute images list --project deeplearning-platform-release --no-standard-images

score 0 · Accepted Answer

我遇到了一个非常相似的问题，结果证明我试图在不支持它的区域中部署 GPU 模型。在此处查看“europe- west4 -c”是否支持您使用的 GPU 类型。例如，如果您使用的是 K80，那么它在该区域中不可用（请参见下面的屏幕截图）。

google-dl-platform - 深度学习 VM 无法通过 UI 部署 - 找不到图像资源 - 图像 url 中的错字

2 回答 2

Related

Reference