我正在尝试使用自定义图像在 Google Cloud AI Platform 上启动笔记本。我遵循了这里描述的做法:
https://cloud.google.com/ai-platform/deep-learning-containers/docs/derivative-container
所以要构建和推送 docker 镜像:
gcloud auth configure-docker
export PROJECT=$(gcloud config list project --format "value(core.project)")
docker build . -f Dockerfile -t "gcr.io/${PROJECT}/my-custom-image:latest"
docker push "gcr.io/${PROJECT}/my-custom-image:latest"
但是,当尝试使用此图像连接到笔记本实例时
gcloud compute --project "myproject" ssh --zone "myzone" "custom-test" -- -L 8080:localhost:8080
我明白了
ssh: connect to host XXX.XXX.XXX.XXX port 22: Connection refused
即使我只使用基本映像而不做任何更改,也会发生这种情况,例如使用这个 Dockerfile:
FROM gcr.io/deeplearning-platform-release/base-cpu:latest
如果我直接启动一个笔记本实例,gcr.io/deeplearning-platform-release/base-cpu:latest
我可以按预期连接到它。
编辑 1:从串行端口 1 日志:
May 9 16:51:31 custom-test GCEGuestAgent[673]: 2020-05-09T16:51:31.7524Z GCEGuestAgent Info: Updating keys for user MYUSER.
[ 206.144111] google_guest_agent[673]: 2020/05/09 16:51:33 logging client: rpc error: code = PermissionDenied desc = The caller does not have permission
May 9 16:51:33 custom-test google_guest_agent[673]: 2020/05/09 16:51:33 logging client: rpc error: code = PermissionDenied desc = The caller does not have permission
May 9 16:53:25 custom-test ntpd[707]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
似乎是权限错误,但我不确定为什么我无权部署从同一帐户推送的图像。会不会有关系custom-test ntpd[707]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
?
编辑 2:现在,大约一个小时后,我可以连接(没有进行任何更改)。但是当访问localhost:8080
我得到:
channel 4: open failed: connect failed: Connection refused
channel 3: open failed: connect failed: Connection refused
作为附加控制台中的输出。
从串行端口 1 日志:
May 9 18:04:36 custom-test systemd[1]: Started Session 4 of user MYUSER.
May 9 18:04:36 custom-test GCEGuestAgent[673]: 2020-05-09T18:04:36.5636Z GCEGuestAgent Info: Updating keys for user MYUSER.
May 9 18:04:37 custom-test google_guest_agent[673]: 2020/05/09 18:04:37 logging client: rpc error: code = PermissionDenied desc = The caller does not have permission
[ 4590.862794] google_guest_agent[673]: 2020/05/09 18:04:37 logging client: rpc error: code = PermissionDenied desc = The caller does not have permission
编辑 3:将映像作为 VM 启动会导致:
[ 26.315675] konlet-startup[535]: 2020/05/09 19:34:57 Launching user container 'gcr.io/myproject/my-custom-image:latest'
[ 26.315713] konlet-startup[535]: 2020/05/09 19:34:57 Configured container 'instance-1-test' will be started with name 'klt-instance-1-test-azmb'.
[ 26.315740] konlet-startup[535]: 2020/05/09 19:34:57 Pulling image: 'gcr.io/myproject/my-custom-image:latest'
[ 26.839555] konlet-startup[535]: 2020/05/09 19:34:57 Error: Failed to start container: Error response from daemon: {"message":"pull access denied for gcr.io/myproject/my-custom-image, repository does not exist or may require 'docker login': denied: Permission denied for \"latest\" from request \"/v2/myproject/my-custom-image/manifests/latest\". "}
[ 26.839839] konlet-startup[535]: 2020/05/09 19:34:57 Saving welcome script to profile.d