hadoop - Hbase - Hadoop：TableInputFormat 扩展

Question

使用 hbase 表作为我的输入，其中的键我已经预处理，以便由与相应行 ID 连接的数字组成，我想放心，所有具有相同数字标题的行都将被处理来自 M/R 工作的同一个映射器。我知道这可以通过扩展 TableInputFormat 来实现，并且我已经看过一两篇关于扩展此类的帖子，但我正在寻找最有效的方法来做这件事。

如果有人有任何想法，请告诉我。

score 0 · Accepted Answer

You can use a PrefixFilter in your scan. http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PrefixFilter.html

And parallelize the launch of your different mappers using Future

final Future<Boolean> newJobFuture = executor.submit(new Callable<Boolean>() {
        @Override
        public Boolean call() throws Exception {
            Job mapReduceJob = MyJobBuilder.createJob(args, thePrefix,
                    ...);
            return mapReduceJob.waitForCompletion(true);
        }
    });

But I believe this is more an approach of a reducer you are looking for.

hadoop - Hbase - Hadoop：TableInputFormat 扩展

1 回答 1

Related

Reference