1

Is there a way to force services deployed using Google Cloud Run for Anthos (hosted on GKE) to be scheduled to node pools that have a GPU?

I created a Kubernetes cluster by going to Kubernetes -> Create Cluster -> GPU Accelerated Computing. This created a Kubernetes cluster with a gpu-pool-1 node pool, containing nodes with a GPU, and a standard-pool-1 node pool, containing nodes without a GPU.

Is there a way I can deploy Cloud Run containers to nodes having a GPU? Maybe by configuring a custom namespace or something?


Note that there is a similar question from close to a year ago, but I do not think that the accepted answer ("Cloud Run on Kubernetes does not support GPUs") is entirely correct.

4

2 回答 2

2

这是 Knative 服务开发的热门话题

当您的 pod 使用 Knative 服务生成时,目前不可能拥有节点选择器和容忍度,但团队正在研究解决方案。

于 2020-01-22T09:51:00.483 回答
0

似乎有一种方法可以让它工作,至少以一种 hacky 的方式,如此处所述

knativeService配置文件似乎确实接受并尊重该limits: nvidia.com/gpu: 1参数。虽然 Cloud Run 接口不允许我们自己指定此参数,但我们可以使用kubectlCLI 手动部署由包含此参数的 yaml 文件定义的 knative 服务。

首先,我们需要创建一个启用了 cpu 节点池、gpu 节点池和 Cloud Run for Anthos 的 GKE 集群。这可以通过转到 来完成Kubernetes Engine -> Create Cluster -> Selecting "GPU Accelerated Computing" on the left cluster templates bar -> Checking the "Enable Cloud Run for Anthos"。创建集群后,我们可以单击“连接”按钮并启动云外壳。在这里,我们可以创建一个service.yaml定义我们的 knative 服务的文件。例如,我们可以改编knative 文档service.yaml中的文件,但指定此服务需要 GPU:

# service.yaml
apiVersion: serving.knative.dev/v1 # Current version of Knative
kind: Service
metadata:
  name: helloworld-go # The name of the app
  namespace: default # The namespace the app will use
spec:
  template:
    spec:
      containers:
        - image: gcr.io/knative-samples/helloworld-go # The URL to the image of the app
          env:
            - name: TARGET # The environment variable printed out by the sample app
              value: "Go Sample v1"
          resources:
            limits:
              nvidia.com/gpu: 1 # The service must be run on a machine with at least one GPU

我们可以使用以下方式部署此服务:

kubectl apply -f service.yaml

并使用以下命令检查其状态:

kubectl get ksvc helloworld-go

helloworld-go服务应该只安排在包含 GPU 的节点上。该服务应该像其他 Cloud Run for Anthos 服务一样显示在 Cloud Run 仪表板上。

于 2020-01-22T18:54:32.323 回答