I'm trying to run HBase(0.94.11) in distributed mode on 3-node Hadoop(1.0.4) cluster but I wish to utilize only two nodes for HBase.
Master/Namenode : cldx-1230-1116( IP : 172.25.38.245)
Regionserver/Slave : cldx-1229-1117(IP : 172.25.39.7)
HBase is getting started but there is no regionserver reflected. In the logs, following errors are shown :
Master/namenode log :
2013-09-03 14:52:23,683 DEBUG org.apache.hadoop.hbase.master.HMaster: Started service threads
2013-09-03 14:52:23,684 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:24,587 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=172.25.39.7:2222 sessionTimeout=180000 watcher=hconnection
2013-09-03 14:52:24,607 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 31222@cldx-1230-1116
2013-09-03 14:52:24,610 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server slave/172.25.39.7:2222. Will not attempt to authenticate using SASL (unknown error)
2013-09-03 14:52:24,615 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to slave/172.25.39.7:2222, initiating session
2013-09-03 14:52:24,631 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server slave/172.25.39.7:2222, sessionid = 0x140e363f8090002, negotiated timeout = 180000
2013-09-03 14:52:25,230 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1546 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:26,753 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3068 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:28,266 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 4582 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
regionserver/slave log :
2013-09-03 16:05:18,307 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=172.25.39.7:2222 sessionTimeout=180000 watcher=regionserver:60020
2013-09-03 16:05:18,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/172.25.39.7:2222. Will not attempt to authenticate using SASL (unknown error)
2013-09-03 16:05:18,336 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 14384@cldx-1229-1117
2013-09-03 16:05:18,348 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to localhost/172.25.39.7:2222, initiating session
2013-09-03 16:05:18,426 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server localhost/172.25.39.7:2222, sessionid = 0x140e363f8090000, negotiated timeout = 180000
2013-09-03 16:05:18,452 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@3a9cfedf
2013-09-03 16:05:18,517 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/online-snapshot/acquired already exists and this is not a retry
2013-09-03 16:05:18,557 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: globalMemStoreLimit=393.4m, globalMemStoreLimitLowMark=344.2m, maxHeap=983.4m
2013-09-03 16:05:18,561 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every 2hrs, 46mins, 40sec
2013-09-03 16:05:18,621 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at localhost,60000,1378199761324
2013-09-03 16:05:28,697 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:390)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:436)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1127)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
at com.sun.proxy.$Proxy8.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2030)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2076)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:744)
at java.lang.Thread.run(Thread.java:722)
slave's zookeeper log :
2013-09-03 16:05:18,345 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.39.7:48173
2013-09-03 16:05:18,392 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.39.7:48173
2013-09-03 16:05:18,395 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.5a
2013-09-03 16:05:18,422 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090000 with negotiated timeout 180000 for client /172.25.39.7:48173
2013-09-03 16:05:18,508 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090000 type:create cxid:0x8 zxid:0x5b txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired
2013-09-03 16:05:33,933 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.38.245:50879
2013-09-03 16:05:33,972 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.38.245:50879
2013-09-03 16:05:33,975 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090001 with negotiated timeout 180000 for client /172.25.38.245:50879
2013-09-03 16:05:42,358 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090001 type:create cxid:0xb zxid:0x5d txntype:-1 reqpath:n/a Error Path:/hbase/master Error:KeeperErrorCode = NodeExists for /hbase/master
2013-09-03 16:05:47,934 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090001 type:create cxid:0x1f zxid:0x63 txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired
2013-09-03 16:05:49,037 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.38.245:50889
2013-09-03 16:05:49,042 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.38.245:50889
2013-09-03 16:05:49,050 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090002 with negotiated timeout 180000 for client /172.25.38.245:50889
2013-09-03 16:08:15,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140e35e60460000, timeout of 180000ms exceeded
2013-09-03 16:08:15,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140d02920860000, timeout of 180000ms exceeded
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140e35e60460001, timeout of 180000ms exceeded
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140e35e60460000
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140d02920860000
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140e35e60460001
regionservers file has only one entry viz. 172.25.39.7
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://172.25.38.245:9000/hbase</value>
<description>The directory shared by RegionServers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>172.25.39.7</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/bigdata/hadoop_ecosystem_dir/zookeeper</value>
</property>
</configuration>
- The Hadoop masters file on the namenode(172.25.38.245) has 172.25.38.245
- The Hadoop slaves file on the namenode(172.25.38.245) 172.25.38.245,172.25.39.7 and 172.25.36.73
- The Hadoop masters file on the slave(172.25.39.7) has 172.25.38.245
- The Hadoop slaves file on the slave(172.25.39.7) has 172.25.39.7
hosts file on master :
#127.0.0.1 localhost
#172.25.38.245 localhost
172.25.38.245 cldx-1230-1116
172.17.88.75 cloudx
172.25.38.245 master
172.25.39.7 slave
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
hosts file on slave :
#127.0.0.1 localhost
#172.25.39.7 localhost
172.25.39.7 cldx-1229-1117 cldx-1229-1117
172.25.38.245 cldx-1230-1116 cldx-1230-1116
172.17.88.75 cloudx
172.25.38.245 master
172.25.39.7 slave
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
I'm clueless as to why the regionserver/slave is trying to connect to the master on the localhost rather than 172.25.38.245 !