0

I have found that our RegionServers connect to the ZooKeeper frequently. They seems to constantly establish the session, close it and reconnect the ZooKeeper. Here is the log for both server and client sides. I have no idea why this happens and how to deal with it? We're using HBase 0.94.11 and ZooKeeper 3.4.4.

The log from HBase RegionServer:

2014-09-18,16:38:17,867 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=10.2.201.74:11000,10.2.201.73:11000,10.101.10.67:11000,10.101.10.66:11000,10.2.201.75:11000 sessionTimeout=30000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@69d892a1
2014-09-18,16:38:17,868 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server lg-hadoop-srv-ct01.bj/10.2.201.73:11000. Will attempt to SASL-authenticate using Login Context section 'Client'
2014-09-18,16:38:17,868 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 11787@lg-hadoop-srv-st05.bj
2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to lg-hadoop-srv-ct01.bj/10.2.201.73:11000, initiating session
2014-09-18,16:38:17,870 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server lg-hadoop-srv-ct01.bj/10.2.201.73:11000, sessionid = 0x248782700e52b3c, negotiated timeout = 30000
2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ZooKeeper: Session: 0x248782700e52b3c closed
2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2014-09-18,16:38:17,878 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total replicated: 24

The log from its ZooKeeper server:

2014-09-18,16:38:17,869 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:2] Accepted socket connection from /10.2.201.76:55621
2014-09-18,16:38:17,869 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] Client attempting to establish new session at /10.2.201.76:55621
2014-09-18,16:38:17,870 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] Established session 0x248782700e52b3c with negotiated timeout 30000 for client /10.2.201.76:55621
2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2] Successfully authenticated client: authenticationID=hbase_srv/hadoop@XIAOMI.HADOOP;  authorizationID=hbase_srv/hadoop@XIAOMI.HADOOP.
2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2] Setting authorizedID: hbase_srv
2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] adding SASL authorization for authorizationID: hbase_srv
2014-09-18,16:38:17,877 INFO org.apache.zookeeper.server.NIOServerCnxn: [myid:2] Closed socket connection for client /10.2.201.76:55621 which had sessionid 0x248782700e52b3c
4

1 回答 1

0

最后我找到了根本原因。

是的,它是关于 ReplicationSink 的,我找到了日志“2014-09-23,14:58:01,736 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Replication for table online_miliao_recent”。

然后我查看了相关代码,发现每次调用replicateEntries()的时候,都会调用sharedHBaseAdmin.tableExists(table)。

sharedHBaseAdmin.tableExists() 将创建一个新的 CatalogTracker 对象,该对象也是 ZooKeeper 客户端。

当此方法退出时,它将清理 ZooKeeper 客户端和会话。

所以这个日志看起来很合理,因为 Replication 正在运行。但是 tableExists() 有点重,我认为我们不应该在每次复制实体时调用它。我还注意到 CatalogTracker 在 0.94.11 之后不在 ReplicationSink 中,因此对于以后的版本来说这不是问题。

如果我找到了从 ReplicationSink 中删除 CatalogTracker 的 jira,那就太好了 :-)

于 2014-09-25T01:51:29.220 回答