我正在运行一个 hadoop 集群(版本:cdh4.1.1)。我设置了两个 HA 名称节点。
第1步。
当我尝试启动我的名称节点时,我得到了这个异常:
2013-03-27 16:52:21,282 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /data/dfs/nn state: NOT_FORMATTED
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:288)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:201)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1128)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1192)
2013-03-27 16:52:21,285 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
第2步。
然后我尝试运行 : sudo hdfs namenode -recover
,我得到了:
13/03/27 16:53:37 INFO hdfs.StateChange: STATE* Safe mode is ON.
Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
步骤 3。
按照说明,我做了sudo hdfs dfsadmin -safemode leave
,我得到了:
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
13/03/27 16:55:17 WARN retry.RetryInvocationHandler: Exception while invoking setSafeMode of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 996ms.
13/03/27 16:55:18 WARN retry.RetryInvocationHandler: Exception while invoking setSafeMode of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 2085ms.
......retrying......
Not retrying because failovers (15) exceeded maximum allowed (15)
java.net.ConnectException: Call From namenode-01.local/10.**.**.24 to namenode-02.local:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
任何想法都受到高度赞赏。