hadoop - HFileOutputFormat 是否启动减速器？

Question

我使用 HFileOutputFormat 将 CSV 文件批量加载到 hbase 表中。我只有 map 而没有使用 job.setNumReduceTasks(0) 的 reduce 任务。但是我可以看到一个减速器在工作中运行，这个减速器是因为 HFileOutputFormat 而启动的吗？
以前我使用 TableOutputFormat 完成相同的工作，其中从来没有运行过减速器。我最近重构了 map 任务以使用 HFileOutputFormat，但现在在此更改之后，我可以看到一个减速器正在运行。

其次，在减速器中出现以下错误，这是我以前使用 TableOutputFormat 没有得到的，这是否也与 HFileOutputFormat 有关？

错误：java.lang.ClassNotFoundException：com.google.common.base.Preconditions

score 2 · Accepted Answer

HFileOutputFormat 确实启动了一个（对于 HFiles 必要的）reduce 任务。

那里弹出错误 Hadoop 需要 Google 的 Guava 库才能生成 HFile。让 Hadoop 找到该库的最简单方法是将其$HBASE_HOME/lib/从$HADOOP_HOME/lib/. 寻找guava-<version>.jar。

score 0 · Accepted Answer

是的，即使我们将 Reducer 的数量设置为零，HFileOutputFormat 也会启动一个 reducer 任务来对 mapper 输出进行排序和合并，以使这个文件 HTable 兼容。reducer 的数量等于 HBase 表中的区域数

在此处查找示例代码以通过 MapReduce 作业为 HBase 批量加载准备数据

hadoop - HFileOutputFormat 是否启动减速器？

2 回答 2

Related

Reference