0

We are working on a Greenplum with HAWQ installed. I would like to run a hadoop-streaming job. However, it seems that hadoop is not configured or started. How can i start mapred to make sure that i can use hadoop-streaming?

4

3 回答 3

0

首先,确保集群已启动并且正在运行。要让它进入 Pivotal Command Center(通常链接是这样的:)https://<admin_node>:5443/并查看集群状态或要求您的管理员这样做。

接下来,确保您在尝试开始工作的机器上安装了 PHD 客户端库。运行“rpm -qa | grep phd”

接下来,如果集群正在运行并安装了库,您可以像这样运行流式作业:

hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-streaming.jar -mapper /bin/cat -reducer /bin/wc -input /example.txt -output /testout

/example.txt 文件应该存在于 HDFS 上

于 2014-11-01T18:18:02.003 回答
0

尝试以下命令来获取字数:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-input <inputDir> \
-output <outputDir> \
-mapper /bin/cat \
-reducer /bin/wc

如果这为您提供了正确的字数,那么它的工作原理是通过运行此命令检查吐出的错误

于 2014-10-21T08:57:12.760 回答
-1

我很久以前就这样做了,Greenplum/Pivotal Hadoop

--1。对于安装 icm_client 部署 ex。- icm_client 部署 HIVE

--2。对于状态 HDFS Service hadoop-namenode status Service hadoop-datanode status Service hadoop-secondarynamenode status MapRed Service hadoop-jobtracker status Service hadoop-tasktracker status Hive service hive-server status service hive-metastore status

--3。用于启动/停止/重启服务 hive-server 启动服务 hive-server 停止服务 hive-server 重启

注意:您将在安装指南中找到所有这些命令和详细信息,可能在某处在线 hadoop 安装指南中可用

谢谢,

于 2014-12-04T07:32:40.147 回答