可以使用 MultipleOutputs 类从 Reducer 将输出写入多个位置。您可以将 file1、file2 和 file3 视为三个文件夹,并将 1000 个 Reducer 的输出数据分别写入这些文件夹。
作业提交的使用模式:
Job job = new Job();
FileInputFormat.setInputPath(job, inDir);
//outDir is the root path, in this case, outDir="/home/user/data/"
FileOutputFormat.setOutputPath(job, outDir);
//You have to assign the output formatclass.Using MultipleOutputs in this way will still create zero-sized default output, eg part-00000. To prevent this use LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); instead of job.setOutputFormatClass(TextOutputFormat.class); in your Hadoop job configuration.
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MOMap.class);
job.setReducerClass(MOReduce.class);
...
job.waitForCompletion(true);
在减速机中的用法:
private MultipleOutputs out;
public void setup(Context context) {
out = new MultipleOutputs(context);
...
}
public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
//'/' characters in baseOutputPath will be translated into directory levels in your file system. Also, append your custom-generated path with "part" or similar, otherwise your output will be -00000, -00001 etc. No call to context.write() is necessary.
for (Text line : values) {
if(line == type1)
out.write(key, new Text(line),"file1/part");
else if(line == type2)
out.write(key, new Text(line),"file2/part");
else if(line == type3)
out.write(key, new Text(line),"file3/part");
}
}
protected void cleanup(Context context) throws IOException, InterruptedException {
out.close();
}
参考:https ://hadoop.apache.org/docs/r2.6.3/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html