1

dask-yarn 文档有以下示例。

from dask_yarn import YarnCluster
from dask.distributed import Client

# Create a cluster where each worker has two cores and eight GiB of memory
cluster = YarnCluster(environment='environment.tar.gz',
                      worker_vcores=2,
                      worker_memory="8GiB")
# Scale out to ten such workers
cluster.scale(10)

# Connect to the cluster
client = Client(cluster)

在示例中,“工人”的定义是什么?它是一个节点(即硬件)吗?是一个过程吗?我传递给什么YarnClustercluster.scale

更具体地说,假设我有一个包含 20 个节点的集群,每个节点有 8 个 CPU 和 32 GB 的 RAM,我想最大限度地利用资源。我必须设置worker_vcores = 8,worker_memory = 32cluster.scale(20)吗?或者做不同的设置工作;例如,worker_vcores = 4worker_memory = 16cluster.scale(40)?

4

0 回答 0