5

几天来,我一直在寻找一种方法,试图找到一种使用减少数据在 hadoop 中进一步映射的方法。我将类对象A作为输入数据,将类对象B作为输出数据。问题是,虽然映射不仅生成了 s,B而且还生成了新A的 s。

这是我想要实现的目标:

1.1 input: a list of As
1.2 map result: for each A a list of new As and a list of Bs is generated
1.3 reduce: filtered Bs are saved as output, filtered As are added to the map jobs

2.1 input: a list of As produced by the first map/reduce
2.2 map result: for each A a list of new As and a list of Bs is generated
2.3 ...

3.1 ...

你应该得到基本的想法。

我已经阅读了很多关于链接的内容,但我不确定如何将 ChainReducer 和 ChainMapper 结合起来,或者即使这是否是正确的方法。

所以这是我的问题:如何拆分映射数据,同时减少将一部分保存为输出,另一部分保存为新的输入数据。

4

1 回答 1

2

尝试使用MultipleOutputs。正如 Javadoc 所建议的那样:

MultipleOutputs 类简化了将输出数据写入多个输出的过程

案例一:写入作业默认输出以外的其他输出。每个额外的输出,或命名的输出,都可以配置有它自己的 OutputFormat、它自己的键类和它自己的值类。

案例二:向用户提供的不同文件写入数据

作业提交的使用模式:

Job job = new Job();

 FileInputFormat.setInputPath(job, inDir);
 FileOutputFormat.setOutputPath(job, outDir);

 job.setMapperClass(MOMap.class);
 job.setReducerClass(MOReduce.class);
 ...

 // Defines additional single text based output 'text' for the job
 MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class,
 LongWritable.class, Text.class);

 // Defines additional sequence-file based output 'sequence' for the job
 MultipleOutputs.addNamedOutput(job, "seq",
   SequenceFileOutputFormat.class,
   LongWritable.class, Text.class);
 ...

 job.waitForCompletion(true);
 ...

在减速机中的用法:

 String generateFileName(K k, V v) {
   return k.toString() + "_" + v.toString();
 }

 public class MOReduce extends
   Reducer<WritableComparable, Writable,WritableComparable, Writable> {
 private MultipleOutputs mos;
 public void setup(Context context) {
 ...
 mos = new MultipleOutputs(context);
 }

 public void reduce(WritableComparable key, Iterator<Writable> values,
 Context context)
 throws IOException {
 ...
 mos.write("text", , key, new Text("Hello"));
 mos.write("seq", LongWritable(1), new Text("Bye"), "seq_a");
 mos.write("seq", LongWritable(2), key, new Text("Chau"), "seq_b");
 mos.write(key, new Text("value"), generateFileName(key, new Text("value")));
 ...
 }

 public void cleanup(Context) throws IOException {
 mos.close();
 ...
 }

 }
于 2013-01-13T18:48:47.213 回答