0

I'm trying to run HBase(0.94.11) in distributed mode on 3-node Hadoop(1.0.4) cluster but I wish to utilize only two nodes for HBase.

Master/Namenode : cldx-1230-1116( IP : 172.25.38.245)
Regionserver/Slave : cldx-1229-1117(IP : 172.25.39.7)

HBase is getting started but there is no regionserver reflected. In the logs, following errors are shown :

Master/namenode log :

2013-09-03 14:52:23,683 DEBUG org.apache.hadoop.hbase.master.HMaster: Started service threads
2013-09-03 14:52:23,684 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:24,587 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=172.25.39.7:2222 sessionTimeout=180000 watcher=hconnection
2013-09-03 14:52:24,607 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 31222@cldx-1230-1116
2013-09-03 14:52:24,610 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server slave/172.25.39.7:2222. Will not attempt to authenticate using SASL (unknown error)
2013-09-03 14:52:24,615 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to slave/172.25.39.7:2222, initiating session
2013-09-03 14:52:24,631 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server slave/172.25.39.7:2222, sessionid = 0x140e363f8090002, negotiated timeout = 180000
2013-09-03 14:52:25,230 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1546 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:26,753 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3068 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2013-09-03 14:52:28,266 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 4582 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.

regionserver/slave log :

2013-09-03 16:05:18,307 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=172.25.39.7:2222 sessionTimeout=180000 watcher=regionserver:60020
2013-09-03 16:05:18,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/172.25.39.7:2222. Will not attempt to authenticate using SASL (unknown error)
2013-09-03 16:05:18,336 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 14384@cldx-1229-1117
2013-09-03 16:05:18,348 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to localhost/172.25.39.7:2222, initiating session
2013-09-03 16:05:18,426 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server localhost/172.25.39.7:2222, sessionid = 0x140e363f8090000, negotiated timeout = 180000
2013-09-03 16:05:18,452 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@3a9cfedf
2013-09-03 16:05:18,517 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/online-snapshot/acquired already exists and this is not a retry
2013-09-03 16:05:18,557 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: globalMemStoreLimit=393.4m, globalMemStoreLimitLowMark=344.2m, maxHeap=983.4m
2013-09-03 16:05:18,561 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every 2hrs, 46mins, 40sec
2013-09-03 16:05:18,621 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at localhost,60000,1378199761324
2013-09-03 16:05:28,697 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:390)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:436)
    at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1127)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
    at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
    at com.sun.proxy.$Proxy8.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
    at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2030)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2076)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:744)
    at java.lang.Thread.run(Thread.java:722)

slave's zookeeper log :

2013-09-03 16:05:18,345 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.39.7:48173
2013-09-03 16:05:18,392 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.39.7:48173
2013-09-03 16:05:18,395 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.5a
2013-09-03 16:05:18,422 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090000 with negotiated timeout 180000 for client /172.25.39.7:48173
2013-09-03 16:05:18,508 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090000 type:create cxid:0x8 zxid:0x5b txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired
2013-09-03 16:05:33,933 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.38.245:50879
2013-09-03 16:05:33,972 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.38.245:50879
2013-09-03 16:05:33,975 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090001 with negotiated timeout 180000 for client /172.25.38.245:50879
2013-09-03 16:05:42,358 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090001 type:create cxid:0xb zxid:0x5d txntype:-1 reqpath:n/a Error Path:/hbase/master Error:KeeperErrorCode = NodeExists for /hbase/master
2013-09-03 16:05:47,934 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x140e363f8090001 type:create cxid:0x1f zxid:0x63 txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired
2013-09-03 16:05:49,037 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /172.25.38.245:50889
2013-09-03 16:05:49,042 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /172.25.38.245:50889
2013-09-03 16:05:49,050 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x140e363f8090002 with negotiated timeout 180000 for client /172.25.38.245:50889
2013-09-03 16:08:15,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140e35e60460000, timeout of 180000ms exceeded
2013-09-03 16:08:15,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140d02920860000, timeout of 180000ms exceeded
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x140e35e60460001, timeout of 180000ms exceeded
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140e35e60460000
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140d02920860000
2013-09-03 16:08:15,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x140e35e60460001

regionservers file has only one entry viz. 172.25.39.7

hbase-site.xml

<configuration>
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://172.25.38.245:9000/hbase</value>
  <description>The directory shared by RegionServers.</description>
</property>

<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
  <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
  </description>
</property>

<property>
  <name>hbase.zookeeper.property.clientPort</name>
  <value>2222</value>
</property>

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>172.25.39.7</value>
</property>

<property>
  <name>hbase.zookeeper.property.dataDir</name>
  <value>/home/bigdata/hadoop_ecosystem_dir/zookeeper</value>
</property>

</configuration>
  1. The Hadoop masters file on the namenode(172.25.38.245) has 172.25.38.245
  2. The Hadoop slaves file on the namenode(172.25.38.245) 172.25.38.245,172.25.39.7 and 172.25.36.73
  3. The Hadoop masters file on the slave(172.25.39.7) has 172.25.38.245
  4. The Hadoop slaves file on the slave(172.25.39.7) has 172.25.39.7

hosts file on master :

#127.0.0.1      localhost
#172.25.38.245   localhost
172.25.38.245   cldx-1230-1116
172.17.88.75    cloudx
172.25.38.245 master
172.25.39.7   slave
# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

hosts file on slave :

#127.0.0.1      localhost
#172.25.39.7     localhost
172.25.39.7     cldx-1229-1117 cldx-1229-1117
172.25.38.245     cldx-1230-1116 cldx-1230-1116
172.17.88.75    cloudx
172.25.38.245 master
172.25.39.7   slave
# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

I'm clueless as to why the regionserver/slave is trying to connect to the master on the localhost rather than 172.25.38.245 !

4

1 回答 1

1

将 HMaster 的 IP 和主机名添加到 RS 的 /etc/hosts 文件中,然后重新启动 HBase 守护进程。一个可能的原因可能是您的 HMaster 假设 RS 的 IP 为 127.0.0.1(这意味着 localhost),因此解析为它自己的 localhost。

是的,JD 是绝对正确的。hbase.master 现在是一个灭绝的财产。

于 2013-09-03T20:46:36.273 回答