1

我使用多机器模式设置了 Hadoop(2.6.0):1 个名称节点 + 3 个数据节点。当我使用命令:start-all.sh 时,它们(namenode、datanode、资源管理器、节点管理器)工作正常。我用 jps 命令检查了它,每个节点上的结果如下:

名称节点:

7300 资源管理器

6942 名称节点

7154 次要名称节点

数据节点:

3840 数据节点

3924 节点管理器

我还在 HDFS 上上传了示例文本文件:/user/hadoop/data/sample.txt。那一刻绝对没有错误。

但是当我尝试使用 hadoop 示例的 jar 运行 mapreduce 时:

hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /user/hadoop/data/sample.txt /user/hadoop/output

我有这个错误:

15/04/08 03:31:26 INFO mapreduce.Job: Job job_1428478232474_0001 running    in uber mode : false
15/04/08 03:31:26 INFO mapreduce.Job:  map 0% reduce 0%
15/04/08 03:31:26 INFO mapreduce.Job: Job job_1428478232474_0001 failed with     state FAILED due to: Application application_1428478232474_0001 failed 2 times due to Error launching appattempt_1428478232474_0001_000002. Got exception: java.net.ConnectException: Call From hadoop/127.0.0.1 to localhost:53245 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy31.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    ... 9 more Failing the application.
15/04/08 03:31:26 INFO mapreduce.Job: Counters: 0

关于配置,确保namenode可以ssh到datanodes,反之亦然,无需提示密码。我还禁用了IP6并修改了/etc/hosts文件:

127.0.0.1 本地主机 hadoop

192.168.56.102 hadoop-nn

192.168.56.103 hadoop-dn1

192.168.56.104 hadoop-dn2

192.168.56.105 hadoop-dn3

我不知道为什么 mapreduce 无法运行,尽管 namenode 和 datanodes 工作正常。我差点卡在这里,你能帮我找出原因吗?

谢谢

编辑:这里在 hdfs-site.xml (namenode) 中配置:

<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///usr/local/hadoop/hadoop_stores/hdfs/namenode</value>
    <description>NameNode directory for namespace and transaction logs storage.</description>
</property>
<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>
<property>
    <name>dfs.datanode.use.datanode.hostname</name>
    <value>false</value>
</property>
<property>
    <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
    <value>false</value>
</property>
<property>
     <name>dfs.namenode.http-address</name>
     <value>hadoop-nn:50070</value>
     <description>Your NameNode hostname for http access.</description>
</property>
<property>
     <name>dfs.namenode.secondary.http-address</name>
     <value>hadoop-nn:50090</value>
     <description>Your Secondary NameNode hostname for http access.</description>
</property>

在数据节点中:

<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///usr/local/hadoop/hadoop_stores/hdfs/data/datanode</value>
    <description>DataNode directory</description>
</property>

<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>
<property>
    <name>dfs.datanode.use.datanode.hostname</name>
    <value>false</value>
</property>
<property>
     <name>dfs.namenode.http-address</name>
     <value>hadoop-nn:50070</value>
     <description>Your NameNode hostname for http access.</description>
</property>
<property>
     <name>dfs.namenode.secondary.http-address</name>
     <value>hadoop-nn:50090</value>
     <description>Your Secondary NameNode hostname for http access.</description>

这是命令的结果:hadoop fs -ls /user/hadoop/data

hadoop@hadoop:~/DATA$ hadoop fs -ls /user/hadoop/data 15/04/09 00:23:27 找到 2 项

-rw-r--r-- 3 hadoop 超级组 29 2015-04-09 00:22 >/user/hadoop/data/sample.txt

-rw-r--r-- 3 hadoop 超级组 27 2015-04-09 00:22 >/user/hadoop/data/sample1.txt

hadoop fs -ls /user/hadoop/输出

ls: `/user/hadoop/output': 没有这样的文件或目录

4

2 回答 2

0

防火墙问题:

java.net.ConnectException:连接被拒绝

此错误可能是由于防火墙问题。在终端中执行此操作:

sudo apt-get install iptables-persistent
sudo iptables -L
sudo iptables-save > /usr/iptables-backup/iptables.v4.rules

在继续之前检查文件是否已创建(因为如果出现问题,这将用于恢复防火墙)。

现在,刷新 iptable 规则(即停止防火墙):

sudo iptables -F

现在试试,

sudo iptables -L

此命令不应返回任何规则。现在,尝试运行您的 map/reduce 作业。

注意:如果要将 iptables 恢复到以前的状态,请在终端中键入:

sudo iptables-restore < /usr/iptables-backup/iptables.v4.rules

于 2015-04-09T05:32:45.590 回答
0

找到解决方案!!看到这篇文章-yarn 将数据节点 id/name 显示为 localhost

Call From localhost.localdomain/127.0.0.1 to localhost.localdomain:56148 failed on connection exception: java.net.ConnectException: Connection refused;

master 和 slave 在 /etc/hostname 中都有 localhost.localdomain 的主机名。
我将slave的主机名更改为slave1和slave2。那行得通。谢谢大家的时间。

@kate 确保 namenode 和 datanodes 中的 etc/hostname 未设置为 localhost。只需在终端中输入 ~# hostname 即可查看。您可以通过相同的命令设置新的主机名。

我的主人和工人或奴隶的 /etc/hosts 看起来像这样-

127.0.0.1    localhost localhost.localdomain localhost4 localhost4.localdomain4
#127.0.1.1    localhost
192.168.111.72  master
192.168.111.65  worker1
192.168.111.66  worker2

worker1 的主机名

hduser@worker1:/mnt/hdfs/datanode$ cat /etc/hostname 
worker1

和工人2

hduser@worker2:/usr/local/hadoop/logs$ cat /etc/hostname 
worker2

此外,您可能不希望使用带有环回接口的“hadoop”主机名。IE

127.0.0.1 localhost hadoop 

在https://wiki.apache.org/hadoop/ConnectionRefused检查这一点 (1) 。

谢谢你。

于 2015-05-28T16:53:06.023 回答