0

我有一个亚马逊 ec2 实例,我正在运行一个名为 seqware 的工具。它基本上是一个使用 hbase 后端的基因组数据查询引擎。我在一个带有以伪分布式模式设置的 hbase 的 ami 上运行。但是,我想以完全分布式模式使用它。所以我建立了一个 2 节点的 hadoop 集群。一个节点是主节点,另一个节点是从节点。我可以在完全分布式模式下运行 hadoop 示例和一切。为了让 seqware 使用我完全分布式的设置,它需要 6 个东西,zookeeper quorum、zookeeper 客户端端口、hbase master、mapred 作业跟踪器、fs 默认 fs 和 fs 名称。在设置文件中指定了哪些。我在文件中设置如下:

HBASE.ZOOKEEPER.QUORUM=ip-10-x.x.x
HBASE.ZOOKEEPER.PROPERTY.CLIENTPORT=2181
HBASE.MASTER=ip-10-x.x.x:60010
MAPRED.JOB.TRACKER=ip-10-x.x.x:9001
FS.DEFAULT.NAME=hdfs://ip-10-x.x.x:9000
FS.DEFAULTFS=hdfs://ip-10-x.x.x:9000

但是,当我开始使用查询引擎时,我遇到了 zookeeper 连接丢失异常。我在seqware的authorized_keys中有master的公钥,反之亦然,但是,我不能像这样ssh

ssh ip-10.x.x.x

甚至使用公共 dns:

ssh {public DNS of instance}

其中 ip-10.xxx 是实例的 IP 地址,我必须使用用户名:

ssh {username}@ip-10-x.x.x

或者

ssh username@{public DNS of instance}

我可以在没有用户名的情况下从主 hadoop 实例 ssh 到从属 hadoop 实例,反之亦然,并且我在配置文件中有没有用户名的 ip 地址

我尝试在设置中的 ip 地址之前添加用户名,认为有 99% 的可能性它不起作用,我并没有失望,我仍然遇到同样的异常

我需要做什么才能从 seqware 实例 ssh 到 hadoop 和 hbase 主节点,而无需指定用户名,就像我在主节点和从节点之间所做的那样。

这是在 hadoop master 上配置 zookeeper 的方式:

<configuration>
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://ip-10-x.x.x:9000/hbase</value>
  </property>

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>ip-10-x.x.x</value>
  </property>

  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>

  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/{username}/hbase/zookeeper</value>
  </property>

  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
  </property>
</configuration>

seqware 的内部实现我不能自信地说,但我知道它使用设置文件来设置 zookeepr 和 hbase master 的位置。在默认的伪分布式工作设置中,这些是我之前提到的变量的值:

HBASE.ZOOKEEPER.QUORUM=localhost
HBASE.ZOOKEEPER.PROPERTY.CLIENTPORT=2181
HBASE.MASTER=localhost:60000
MAPRED.JOB.TRACKER=localhost:8021
FS.DEFAULT.NAME=hdfs://localhost:8020
FS.DEFAULTFS=hdfs://localhost:8020

zoo.cfg 文件如下所示:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

maxClientCnxns=50
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/var/lib/zookeeper
# the port at which the clients will connect
clientPort=2181

Zookeeper 堆栈跟踪:

[seqware@master target]$ java -classpath seqware-distribution-0.13.6.8-qe-full.jar

com.github.seqware.queryengine.system.ReferenceCreator hg_19 keyValue_ref.out 
[SeqWare Query Engine] 0 [main] ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - ZooKeeper exists failed after 3 retries
[SeqWare Query Engine] 1 [main] ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher - hconnection Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at 
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)   
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeep‌​er.java:154) at                org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)at            org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracke‌​r.java:82)  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setu‌​pZookeeperTrackers(HConnectionManager.java:580)

08/13/2013 显然需要为远程 hbase 设置设置的变量不是我正在编辑的变量,基于 seqware constants.java 文件,它们是 qe 变量:https ://github.com/ SeqWare/seqware/blob/develop/seqware-queryengine/src/main/java/com/github/seqware/queryengine/Constants.java

我已经将它们编辑为:

# SEQWARE QUERY ENGINE AND GENERAL HADOOP SETTINGS
#
HBASE.ZOOKEEPER.QUORUM=localhost
HBASE.ZOOKEEPER.PROPERTY.CLIENTPORT=2181
HBASE.MASTER=localhost:60000
MAPRED.JOB.TRACKER=localhost:8021
FS.DEFAULT.NAME=hdfs://localhost:8020
FS.DEFAULTFS=hdfs://localhost:8020
FS.HDFS.IMPL=org.apache.hadoop.hdfs.DistributedFileSystem
#
# SEQWARE QUERY ENGINE SETTINGS
#
QE_NAMESPACE=SeqWareQE
QE_DEVELOPMENT_DEPENDENCY=file:/home/seqware/jars/seqware-distribution-0.13.6.5-qe-full.jar
QE_PERSIST=true
QE_HBASE_REMOTE_TESTING=true
QE_HBASE_PROPERTIES=HBOOT
QE_HBOOT_HBASE_ZOOKEEPER_QUORUM=ip-10-x.x.x.ec2.internal
QE_HBOOT_HBASE_ZOOKEEPER_PROPERTY_CLIENTPORT=2181
QE_HBOOT_HBASE_MASTER=ip-10-x.x.x.ec2.internal:60010
QE_HBOOT_MAPRED_JOB_TRACKER=ip-10-x.x.x.ec2.internal:9001
QE_HBOOT_FS_DEFAULT_NAME=hdfs://ip-10-x.x.x.ec2.internal:9000
QE_HBOOT_FS_DEFAULTFS=hdfs://ip-10-x.x.x.ec2.internal:9000
QE_HBOOT_FS_HDFS_IMPL=org.apache.hadoop.hdfs.DistributedFileSystem

我不再获得 zookeeper 异常,但是创建工作区的命令在我停止之前只挂起几分钟。

我在我的 zookepper 日志中发现了这个,我不确定这是否意味着 zookeeper 崩溃或者它失去了与它所声明的客户端的连接。我不知道为什么它接受来自端口 36997、36998、37000 和 37034 的套接字连接,而且我什至没有授予它们在 ec2 安全组上的权限:

2013-08-13 16:44:55,560 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1407890cb630000 with negotiated timeout 180000 for client /10.x.x.x:36997
2013-08-13 16:44:57,633 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.x.x.x:36998
2013-08-13 16:44:57,662 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.x.x.x:36998
2013-08-13 16:44:57,666 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1407890cb630001 with negotiated timeout 180000 for client /10.x.x.x:36998
2013-08-13 16:44:57,917 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x1407890cb630001 type:create cxid:0x8 zxid:0x219 txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired
2013-08-13 16:44:58,450 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x1407890cb630000 type:create cxid:0xb zxid:0x21a txntype:-1 reqpath:n/a Error Path:/hbase/master Error:KeeperErrorCode = NodeExists for /hbase/master
2013-08-13 16:45:00,927 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.x.x.x:37000
2013-08-13 16:45:00,928 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.x.x.x:37000
2013-08-13 16:45:00,930 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1407890cb630002 with negotiated timeout 180000 for client /10.x.x.x:37000
2013-08-13 16:45:02,165 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x1407890cb630000 type:create cxid:0x24 zxid:0x221 txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired
2013-08-13 16:45:14,172 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.x.x.x:37034
2013-08-13 16:45:14,173 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.x.x.x:37034
2013-08-13 16:45:14,178 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x1407890cb630003 with negotiated timeout 180000 for client /10.x.x.x:37034
2013-08-13 16:47:51,000 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1407800784a0003, timeout of 180000ms exceeded
2013-08-13 16:47:51,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1407800784a0001, timeout of 180000ms exceeded
2013-08-13 16:47:51,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1407800784a0000, timeout of 180000ms exceeded
2013-08-13 16:47:51,001 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1407800784a0002, timeout of 180000ms exceeded
2013-08-13 16:47:51,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1407800784a0003
2013-08-13 16:47:51,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1407800784a0001
2013-08-13 16:47:51,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1407800784a0000
2013-08-13 16:47:51,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x1407800784a0002

我查看了 hbase Web 界面,它显示实际上正在创建表,但是创建的命令永远不会返回响应,它们只是挂起。 Hbase 主 web 界面表

HBase 网页界面

4

1 回答 1

0

尝试改变,

<property>
   <name>hbase.zookeeper.quorum</name>
   <value>ip-10-x.x.x</value>
</property>

对此,

<property>
   <name>hbase.zookeeper.quorum</name>
   <value>localhost</value>
</property>

然后重启hbase和Zookeeper

并且,还建议将此作为您的 seqware 配置

# SEQWARE QUERY ENGINE AND GENERAL HADOOP SETTINGS
#
HBASE.ZOOKEEPER.QUORUM=localhost
HBASE.ZOOKEEPER.PROPERTY.CLIENTPORT=2181
HBASE.MASTER=localhost:60000
MAPRED.JOB.TRACKER=localhost:8021
FS.DEFAULT.NAME=hdfs://localhost:8020
FS.DEFAULTFS=hdfs://localhost:8020
FS.HDFS.IMPL=org.apache.hadoop.hdfs.DistributedFileSystem
#
# SEQWARE QUERY ENGINE SETTINGS
#
QE_NAMESPACE=SeqWareQE
QE_DEVELOPMENT_DEPENDENCY=file:/home/seqware/jars/seqware-distribution-0.13.6.5-qe-full.jar
QE_PERSIST=true
QE_HBASE_REMOTE_TESTING=true
QE_HBASE_PROPERTIES=HBOOT
QE_HBOOT_HBASE_ZOOKEEPER_QUORUM=ip-10-x.x.x.ec2.internal
QE_HBOOT_HBASE_ZOOKEEPER_PROPERTY_CLIENTPORT=2181
QE_HBOOT_HBASE_MASTER=ip-10-x.x.x.ec2.internal:60010
QE_HBOOT_MAPRED_JOB_TRACKER=ip-10-x.x.x.ec2.internal:9001
QE_HBOOT_FS_DEFAULT_NAME=hdfs://ip-10-x.x.x.ec2.internal:9000
QE_HBOOT_FS_DEFAULTFS=hdfs://ip-10-x.x.x.ec2.internal:9000
QE_HBOOT_FS_HDFS_IMPL=org.apache.hadoop.hdfs.DistributedFileSystem

此外,请尝试此处此处的指南。

您还提到了一个可能 Zookeeper 端口尚未打开。为了测试目的,我建议你禁用防火墙。因为我之前看到过几个问题是因为防火墙阻止了重要端口。

于 2013-08-12T20:33:48.990 回答