linux - 在运行 hadoop 集群时在 Google Cloud Platform 上获取“sudo：未知用户：hadoop”和“sudo：无法初始化策略插件错误”

Question

我正在尝试在 Google Cloud Platform 上部署 Google 在https://github.com/GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop上提供的示例 Hadoop 应用程序。

我按照那里提供的所有设置说明一步一步地进行操作。我能够设置环境并成功启动集群。但我无法运行 MapReduce 部分。我在我的终端上执行这个命令：

./compute_cluster_for_hadoop.py mapreduce <project ID> <bucket name> [--prefix <prefix>]
--input gs://<input directory on Google Cloud Storage>  \
--output gs://<output directory on Google Cloud Storage>  \
--mapper sample/shortest-to-longest-mapper.pl  \
--reducer sample/shortest-to-longest-reducer.pl  \
--mapper-count 5  \
--reducer-count 1

我收到以下错误：

sudo: unknown user: hadoop
sudo: unable to initialize policy plugin
Traceback (most recent call last):
File "./compute_cluster_for_hadoop.py", line 230, in <module>
main()
File "./compute_cluster_for_hadoop.py", line 226, in main
ComputeClusterForHadoop().ParseArgumentsAndExecute(sys.argv[1:])
File "./compute_cluster_for_hadoop.py", line 222, in ParseArgumentsAndExecute
params.handler(params)
File "./compute_cluster_for_hadoop.py", line 51, in MapReduce
gce_cluster.GceCluster(flags).StartMapReduce()
File "/home/ubuntu-gnome/Hadoop-sample-app/solutions-google-compute-engine-cluster-for-hadoop-master/gce_cluster.py", line 545, in StartMapReduce
input_dir, output_dir)
File "/home/ubuntu-gnome/Hadoop-sample-app/solutions-google-compute-engine-cluster-for-hadoop-master/gce_cluster.py", line 462, in _StartScriptAtMaster
raise RemoteExecutionError('Remote execution error')
gce_cluster.RemoteExecutionError: Remote execution error

由于我已按照原样执行了所有步骤，因此我无法理解为什么会出现此问题？

'hadoop'用户实际上是不是在之前执行的脚本中创建的，还是用户权限有问题？还是问题出在其他地方？

请帮我解决这个错误..!! 我被困在这里，无法继续前进。

score 1 · Accepted Answer

设置过程通常会自动创建用户“hadoop”；它在第 75-76 行的startup-script.sh中完成：

# Set up user and group
groupadd --gid 5555 hadoop
useradd --uid 1111 --gid hadoop --shell /bin/bash -m hadoop

设置的某些部分可能实际上失败了。

也就是说，如果您正在编写自己的与 GCE API 直接交互的 Python 应用程序，您所引用的示例仍然可以作为起点，但作为在 Google Compute Engine 上部署 Hadoop 的一种方式，已被弃用。如果您真的想使用 Hadoop，您应该使用 Google 支持的部署工具bdutil 及其相关的快速入门。部署的集群有一些相似之处，包括用户的设置hadoop。然而，一个关键的区别在于，它bdutil还将包含和配置Hadoop 的 GCS 连接器，以便您的 MapReduce 可以直接针对 GCS 中的数据进行操作，而无需先将其复制到 HDFS 中。

linux - 在运行 hadoop 集群时在 Google Cloud Platform 上获取“sudo：未知用户：hadoop”和“sudo：无法初始化策略插件错误”

1 回答 1

Related

Reference