google-cloud-platform - 使用 bdutil 从现有 GCE hadoop/spark 集群中添加或删除节点

Question

我开始在由谷歌云存储支持的谷歌计算引擎上运行 Spark 集群，该引擎使用 bdutil（在 GoogleCloudPlatform github 上）部署，我这样做如下：

./bdutil -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket deploy

我希望我可能想从 2 个节点的集群开始（默认情况下），然后想添加另一个工作节点来处理需要运行的大作业。如果可能的话，我想在不完全破坏和重新部署集群的情况下做到这一点。

我尝试使用具有不同数量节点的相同命令重新部署，或者运行“create”和“run_command_group install_connectors”，如下所示，但是对于其中的每一个，我都会收到有关已经存在的节点的错误，例如

./bdutil -n 3 -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket deploy

或者

./bdutil -n 3 -b myhdfsbucket create
./bdutil -n 3 -t workers -b myhdfsbucket run_command_group install_connectors

我也尝试过快照和克隆一个已经在运行的工作人员，但并不是所有的服务似乎都可以正确启动，而且我在那里有点超出我的深度。

关于如何/应该如何从现有集群中添加和/或删除节点的任何指导？

score 3 · Accepted Answer

更新：我们将 resize_env.sh 添加到基本bdutil 存储库中，因此您无需再去我的 fork 了

原答案：

目前还没有官方支持调整 bdutil 部署的集群的大小，但这肯定是我们之前讨论过的内容，实际上整合一些调整大小的基本支持是相当可行的。一旦合并到主分支中，这可能会采用不同的形式，但我已经将调整大小支持的初稿推送到我的 bdutil 分支。这是在两个提交中实现的；一个允许跳过所有“主”操作（包括创建、运行命令、删除等），另一个允许添加resize_env.sh文件。

我还没有针对其他 bdutil 扩展的所有组合对其进行测试，但我至少已经成功地使用 basebdutil_env.sh和extensions/spark/spark_env.sh. 从理论上讲，它也应该适用于您的 bigquery 和数据存储扩展。要在您的情况下使用它：

# Assuming you initially deployed with this command (default n == 2)
./bdutil -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket -n 2 deploy

# Before this step, edit resize_env.sh and set NEW_NUM_WORKERS to what you want.
# Currently it defaults to 5.
# Deploy only the new workers, e.g. {hadoop-w-2, hadoop-w-3, hadoop-w-4}:
./bdutil -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket -n 2 -e resize_env.sh deploy

# Explicitly start the Hadoop daemons on just the new workers:
./bdutil -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket -n 2 -e resize_env.sh run_command -t workers -- "service hadoop-hdfs-datanode start && service hadoop-mapreduce-tasktracker start"

# If using Spark as well, explicitly start the Spark daemons on the new workers:
./bdutil -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket -n 2 -e resize_env.sh run_command -t workers -u extensions/spark/start_single_spark_worker.sh -- "./start_single_spark_worker.sh"

# From now on, it's as if you originally turned up your cluster with "-n 5".
# When deleting, remember to include those extra workers:
./bdutil -b myhdfsbucket -n 5 delete

一般来说，最佳实践建议是将您的配置压缩到一个文件中，而不是总是传递标志。例如，在您的情况下，您可能需要一个名为my_base_env.sh：

import_env bigquery_env.sh
import_env datastore_env.sh
import_env extensions/spark/spark_env.sh

NUM_WORKERS=2
CONFIGBUCKET=myhdfsbucket

然后调整大小命令要短得多：

# Assuming you initially deployed with this command (default n == 2)
./bdutil -e my_base_env.sh deploy

# Before this step, edit resize_env.sh and set NEW_NUM_WORKERS to what you want.
# Currently it defaults to 5.
# Deploy only the new workers, e.g. {hadoop-w-2, hadoop-w-3, hadoop-w-4}:
./bdutil -e my_base_env.sh -e resize_env.sh deploy

# Explicitly start the Hadoop daemons on just the new workers:
./bdutil -e my_base_env.sh -e resize_env.sh run_command -t workers -- "service hadoop-hdfs-datanode start && service hadoop-mapreduce-tasktracker start"

# If using Spark as well, explicitly start the Spark daemons on the new workers:
./bdutil -e my_base_env.sh -e resize_env.sh run_command -t workers -u extensions/spark/start_single_spark_worker.sh -- "./start_single_spark_worker.sh"

# From now on, it's as if you originally turned up your cluster with "-n 5".
# When deleting, remember to include those extra workers:
./bdutil -b myhdfsbucket -n 5 delete

最后，这与您最初部署集群时所用的不是 100% 相同-n 5；您的主节点上的文件/home/hadoop/hadoop-install/conf/slaves，/home/hadoop/spark-install/conf/slaves在这种情况下将丢失您的新节点。如果您打算使用/home/hadoop/hadoop-install/bin/[stop|start]-all.sh或者，您可以手动 SSH 到您的主节点并编辑这些文件以将您的新节点添加到列表中/home/hadoop/spark-install/sbin/[stop|start]-all.sh；如果没有，则无需更改这些从属文件。

google-cloud-platform - 使用 bdutil 从现有 GCE hadoop/spark 集群中添加或删除节点

1 回答 1

Related

Reference