4

我们用完了测试 HDFS 集群上的所有可用空间,因此 HBase 崩溃了。清理一些空间后,我们能够重新启动 HBase,但在启动后分布式日志拆分作业一直失败。这份工作看起来像这样:

Splitting log file hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 into a temporary staging area.

Regionserver 正在尝试获取文件的租约一段时间:

2013-10-24 11:50:47,662 DEBUG org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or departed
2013-10-24 11:50:47,671 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker host-4,60020,1382614844870 acquired task /hbase/splitlog/hdfs%3A%2F%2F192.168.249.1%3A9000%2Fhdfs%2Fhbase%2F.logs%2Fhost-3%2C60020%2C1382113928374-splitting%2Fhost-3%252C60020%252C1382113928374.1382523937002
2013-10-24 11:50:47,672 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog: hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002, length=41274332
2013-10-24 11:50:47,672 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovering lease on dfs file hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002
2013-10-24 11:50:47,673 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=0 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 1ms
2013-10-24 11:50:50,674 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=1 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 3002ms
2013-10-24 11:50:51,674 DEBUG org.apache.hadoop.hbase.util.FSHDFSUtils: isFileClosed not available
2013-10-24 11:51:51,680 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: recoverLease=false, attempt=2 on file=hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 after 64008ms

然后 Master 中止工作:

2013-10-24 11:55:48,685 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
2013-10-24 11:55:48,687 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of hdfs://192.168.249.1:9000/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 interrupted, resigning
java.io.InterruptedIOException
    at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:136)
    at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:54)
    at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:780)
    at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:414)
    at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
    at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112)
    at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
    at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
    at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
    at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.InterruptedException: sleep interrupted
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:118)
    ... 9 more

在我看来,问题在于 Regionserver 无法获得此文件的租约,因为它已经打开,所以我检查了sudo -u hdfs hadoop fsck /hdfs/hbase/.logs/ -openforwrite,它确认:

OPENFORWRITE: /hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002 41274332 bytes, 1 block(s), OPENFORWRITE:
/hdfs/hbase/.logs/host-3,60020,1382113928374-splitting/host-3%2C60020%2C1382113928374.1382523937002:  Under replicated blk_1073337163743094520_3534698. Target Replicas is 3 but found 2 replica(s).

我试图关闭 HBase,但文件保持打开状态。我怎样才能删除这个标志?

ps> Hadoop 1.0.1,HBase 0.94.12

4

0 回答 0