1

我正在运行一个 hadoop 集群(版本:cdh4.1.1)。我设置了两个 HA 名称节点。

第1步。

当我尝试启动我的名称节点时,我得到了这个异常:

2013-03-27 16:52:21,282 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data/dfs/nn state: NOT_FORMATTED
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:288)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1128)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1192)
2013-03-27 16:52:21,285 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

第2步。

然后我尝试运行 : sudo hdfs namenode -recover,我得到了:

13/03/27 16:53:37 INFO hdfs.StateChange: STATE* Safe mode is ON. 
Use "hdfs dfsadmin -safemode leave" to turn safe mode off.

步骤 3。

按照说明,我做了sudo hdfs dfsadmin -safemode leave,我得到了:

WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
13/03/27 16:55:17 WARN retry.RetryInvocationHandler: Exception while invoking setSafeMode of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 996ms.
13/03/27 16:55:18 WARN retry.RetryInvocationHandler: Exception while invoking setSafeMode of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 2085ms.
......retrying......
Not retrying because failovers (15) exceeded maximum allowed (15)
java.net.ConnectException: Call From namenode-01.local/10.**.**.24 to namenode-02.local:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

任何想法都受到高度赞赏。

4

1 回答 1

2

除非发生了某种奇怪的魔法,否则您会忘记格式化名称节点(正如例外情况已经说明的那样)。如果您还没有这样做,请运行hadoop -namenode format. 请注意,如果您格式化您的名称节点,这是破坏性的。

于 2013-03-31T21:17:16.853 回答