1

我有一个要求,比如我想在映射器和减速器类之间共享一个变量。情景如下:-

假设我的输入记录是 A、B 和 C 类型。我正在处理这些记录并相应地在 map 函数中为 output.collect 生成键和值。但同时我还在mapper类中声明了3个静态int变量来保持记录A、B和C的类型计数。现在这些变量将被各种map线程更新。当所有地图任务完成后,我想将这三个值传递给 Reduce 函数。

如何做到这一点?我尝试覆盖 close() 方法,但它会在每个地图函数执行后调用,而不是在所有地图函数执行完毕后调用。或者有没有其他方法可以共享变量。我希望输出每种类型记录的总数以及我正在显示的任何已处理输出。

4

2 回答 2

3

Counters are there for a specific reason, ie. to keep count of some specific state, for example, "NUMBER_OF_RECORDS_DISCARDED".And I believe one can only increment these counters and not set to any arbitrary value(I may be wrong here). But sure they can be used as message passers, but there is a better way, and that is to use job configuration to set a variable and seamlessly. But this can only be used to pass a custom message to mapper or reducer and the changes in mapper will not be available in reducer.

Setting the message/variable using the old mapred API

JobConf job = (JobConf) getConf();
job.set("messageToBePassed-OR-anyValue", "123-awesome-value :P");

Setting the message/variable using the new mapreduce API:

Configuration conf = new Configuration();
conf.set("messageToBePassed-OR-anyValue", "123-awesome-value :P");
Job job = new Job(conf);

Getting the message/variable using the old API in the Mapper and Reducer: The configure() has to be implemented in the Mapper and Reducer class and the values may be then assigned to a class member so as to be used inside map() or reduce().

...
private String awesomeMessage;
public void configure(JobConf job) {
    awesomeMessage = Long.parseLong(job.get("messageToBePassed-OR-anyValue"));
}
...

The variable awesomeMessage can then be used with the map and reduce functions.

Getting the message/variable using the new API in the Mapper and Reducer: Similar thing needs to be done here in the setup().

Configuration conf = context.getConfiguration();
String param = conf.get("messageToBePassed-OR-anyValue");
于 2013-01-07T21:26:57.253 回答
1

得到了解决方案。

使用的计数器。Mapper 和 Reducer 中的记者类都可以访问它。

于 2013-01-07T14:45:18.133 回答