hadoop - Hadoop：什么时候在 reducer 中调用 setup 方法？

Question

据我了解，reduce 任务分为三个阶段。

Shuffle、Sort 和实际的 reduce 调用。

所以通常在 hadoop 作业的输出中，我们会看到类似 map 0% reduce 0% map 20% reduce 0% 的内容。. . 地图 90% 减少 10% 。. .

所以我假设reduce任务在所有映射完成之前开始，这种行为由慢启动配置控制。

现在我还不明白reducer的setup方法是什么时候真正被调用的。

在我的用例中，我有一些文件要在 setup 方法中解析。该文件大小约为 60MB，是从分布式缓存中提取的。在解析文件时，配置中的另一组数据可以更新刚刚解析的记录。在解析和可能的更新之后，文件存储在 HashMap 中以便快速查找。所以我希望尽快调用这个方法，可能在映射器还在做他们的事情的时候。

是否有可能做到这一点？或者这已经发生了？

谢谢

score 1 · Accepted Answer

Setup在它能够从流中读取第一个键/值对之前被调用。

在所有映射器运行并且给定reducer分区的所有合并完成之后，这实际上是有效的。

score 0 · Accepted Answer

正如Hadoop 文档中所解释的，setup()方法在任务开始时被调用一次。它应该用于实例化资源/变量或读取可配置参数，而这些参数又可以在reduce()方法中使用。把它想象成一个构造函数。

这是一个示例减速器：

class ExampleReducer extends TableReducer<ImmutableBytesWritable, ImmutableBytesWritable, ImmutableBytesWritable> {

    private int runId;
    private ObjectMapper objectMapper;

    @Override
    protected void setup(Context context) throws IOException {
        Configuration conf = context.getConfiguration();
        this.runId = Integer.valueOf(conf.get("stackoverflow_run_id"));
        this.objectMapper = new ObjectMapper();
    }


    @Override
    protected void reduce(ImmutableBytesWritable keyFromMap, Iterable<ImmutableBytesWritable> valuesFromMap, Context context) throws IOException, InterruptedException {
        // your code
        var = objectMapper.writeValueAsString();
        // your code
        context.write(new ImmutableBytesWritable(somekey.getBytes()), put);
    }
}

hadoop - Hadoop：什么时候在 reducer 中调用 setup 方法？

2 回答 2

Related

Reference