1

使用 Google Life Sciences API 时是否可以在 Nextflow 中指定 VM 的启动映像?具体来说,我说的是bootImage这里的参数: https ://cloud.google.com/life-sciences/docs/reference/rest/v2beta/projects.locations.pipelines/run#virtualmachine

编辑

原因如下:当我尝试生成几个使用 GPU 的工作人员时,我收到以下错误消息:

(more omitted..)
+ NVIDIA_DRIVER_VERSION=450.51.06
+ NVIDIA_DRIVER_MD5SUM=
+ NVIDIA_INSTALL_DIR_HOST=/var/lib/nvidia
+ NVIDIA_INSTALL_DIR_CONTAINER=/usr/local/nvidia
+ ROOT_MOUNT_DIR=/root
+ CACHE_FILE=/usr/local/nvidia/.cache
+ LOCK_FILE=/root/tmp/cos_gpu_installer_lock
+ LOCK_FILE_FD=20
+ set +x
[INFO    2021-02-24 18:27:39 UTC] PRELOAD: false
[INFO    2021-02-24 18:27:39 UTC] Running on COS build id 13310.1209.10
[INFO    2021-02-24 18:27:39 UTC] Data dependencies (e.g. kernel source) will be fetched from https://storage.googleapis.com/cos-tools/13310.1209.10
[INFO    2021-02-24 18:27:39 UTC] Getting the kernel source repository path.
[INFO    2021-02-24 18:27:39 UTC] Obtaining kernel_info file from https://storage.googleapis.com/cos-tools/13310.1209.10/kernel_info
[INFO    2021-02-24 18:27:40 UTC] Downloading kernel_info file from https://storage.googleapis.com/cos-tools/13310.1209.10/kernel_info

real    0m0.079s
user    0m0.014s
sys    0m0.004s
[INFO    2021-02-24 18:27:40 UTC] Checking if this is the only cos-gpu-installer that is running.
[INFO    2021-02-24 18:27:40 UTC] Checking if third party kernel modules can be installed
[INFO    2021-02-24 18:27:40 UTC] Checking cached version
[INFO    2021-02-24 18:27:40 UTC] Cache file /usr/local/nvidia/.cache not found.
[INFO    2021-02-24 18:27:40 UTC] Did not find cached version, building the drivers...
[INFO    2021-02-24 18:27:40 UTC] Downloading GPU installer ... 
[INFO    2021-02-24 18:27:40 UTC] Downloading from https://storage.googleapis.com/nvidia-drivers-eu-public/nvidia-cos-project/85/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_85-13310-1209-10.cos
[INFO    2021-02-24 18:27:40 UTC] Downloading GPU installer from https://storage.googleapis.com/nvidia-drivers-eu-public/nvidia-cos-project/85/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_85-13310-1209-10.cos

real    0m0.811s
user    0m0.175s
sys    0m0.495s
[INFO    2021-02-24 18:27:41 UTC] Setting up compilation environment
[INFO    2021-02-24 18:27:41 UTC] Obtaining toolchain_env file from https://storage.googleapis.com/cos-tools/13310.1209.10/toolchain_env
[INFO    2021-02-24 18:27:41 UTC] Downloading toolchain_env file from https://storage.googleapis.com/cos-tools/13310.1209.10/toolchain_env

real    0m0.021s
user    0m0.013s
sys    0m0.003s
[INFO    2021-02-24 18:27:41 UTC] Found toolchain path file locally
ls: cannot access '/build/cos-tools': No such file or directory
[INFO    2021-02-24 18:27:41 UTC] /build/cos-tools: 
ls: cannot access '/build/cos-tools': No such file or directory
[INFO    2021-02-24 18:27:41 UTC] Downloading toolchain from https://storage.googleapis.com/chromiumos-sdk/2020/06/x86_64-cros-linux-gnu-2020.06.25.065836.tar.xz
[INFO    2021-02-24 18:27:41 UTC] Downloading toolchain archive from https://storage.googleapis.com/chromiumos-sdk/2020/06/x86_64-cros-linux-gnu-2020.06.25.065836.tar.xz
curl: (16) Error in the HTTP2 framing layer

real    0m2.403s
user    0m0.580s
sys    0m1.461s
[ERROR   2021-02-24 18:27:44 UTC] Could not download toolchain archive from https://storage.googleapis.com/chromiumos-sdk/2020/06/x86_64-cros-linux-gnu-2020.06.25.065836.tar.xz, giving up.

所以关于一些无法找到或安装的工具链。但是,此问题随机发生。有时我产生 72 个工人,一切都很好。有时我会收到此错误。

我认为更改我提到的参数可以解决这个问题,但你是对的,它可能不会。

我做了一些挖掘,但几乎找不到任何相关的东西。我能找到的唯一相关的线程是https://github.com/DataBiosphere/dsub/issues/215,但也没有发布解决方案。

4

1 回答 1

1

目前不是(从 v21.02.0-edge 开始),请参见此处:

https://github.com/nextflow-io/nextflow/blob/v21.02.0-edge/plugins/nf-google/src/main/nextflow/cloud/google/lifesciences/GoogleLifeSciencesConfig.groovy

从文档来看,该bootImage选项似乎只有一个有限的用例,并且无论如何对于容器化工作流可能没有多大意义:

引导映像

要使用的主机操作系统映像。

目前,只能使用 Container-Optimized OS 映像。

默认值为 projects/cos-cloud/global/images/family/cos-stable,选择 Container-Optimized OS 的最新稳定版本。

提供此选项是为了允许针对操作系统的 beta 版本进行测试,以确保新版本不会与生产管道产生负面影响。

要针对 Container-Optimized OS 的 beta 版本测试管道,请使用值 projects/cos-cloud/global/images/family/cos-beta。

除了最新的稳定版本之外,您是否有某些原因需要不同的启动映像来运行您的工作流程?也许是一个更新的码头工人?


编辑:

我刚刚查看了我认为在这里运行的代码,不确定我是否有正确的版本,但不确定这是否真的很重要:

https://github.com/GoogleCloudPlatform/cos-gpu-installer/blob/v20210204/cos-gpu-installer-docker/entrypoint.sh#L299-L324

我认为“无法从...下载工具链存档”消息是准确的,我们看到 curl 报告:“HTTP2 框架层中的错误”。为什么?如果它偶尔发生,我认为这只是一个下载/超时错误。HTTP2 框架层中的错误有点奇怪。我不确定这究竟意味着什么,以及为什么有时使用 HTTP 版本 2 会成为问题。我认为你最好的选择是在这里打开一个问题:

https://github.com/GoogleCloudPlatform/cos-gpu-installer/issues

于 2021-02-25T02:41:28.003 回答