java - 如何在 Hadoop 0.20 中使用 MultipleoutputFormai？

Question

我正在使用 Hadoop 0.20，我想要两个 reduce 输出文件而不是一个输出。我知道这MultipleOutputFormat在 Hadoop 0.20 中不起作用。我在 Eclipse 项目的构建路径中添加了 hadoop1.1.1-core jar 文件。但它仍然显示最后一个错误。

这是我的代码：

public static class ReduceStage extends Reducer<IntWritable, BitSetWritable, IntWritable, Text>
{
    private MultipleOutputs mos;
    public ReduceStage() {
        System.out.println("ReduceStage");
    }

    public void setup(Context context) {
        mos = new MultipleOutputs(context);
    }

    public void reduce(final IntWritable key, final Iterable<BitSetWritable> values, Context output ) throws IOException, InterruptedException
    {
        mos.write("text1", key, new Text("Hello")); 
    }

    public void cleanup(Context context) throws IOException {
        try {
            mos.close();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

在运行（）中：

FileOutputFormat.setOutputPath(job, ConnectedComponents_Nodes);
job.setOutputKeyClass(MultipleTextOutputFormat.class);
MultipleOutputs.addNamedOutput(job, "text1", TextOutputFormat.class,
                IntWritable.class, Text.class);

错误是：

java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputName(Lorg/apache/hadoop/mapreduce/JobContext;Ljava/lang/String;)V
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:409)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:370)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:348)
at bitsetmr$ReduceStage.reduce(bitsetmr.java:179)
at bitsetmr$ReduceStage.reduce(bitsetmr.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

我能做什么MultipleOutputFormat？我使用的代码正确吗？

score 0 · Accepted Answer

首先，您应该确保FileOutputFormat.setOutputName版本 0.20 和 1.1.1 之间的代码相同。如果没有，你必须有兼容的版本来编译你的代码。如果相同，则您的命令中一定有一些参数错误。

我遇到了同样的问题，我-Dmapreduce.user.classpath.first=true从运行命令中删除了它并且它可以工作。希望有帮助！

score 0 · Accepted Answer

您可以选择覆盖的扩展名，MultipleTextOutputFormat然后将记录的所有内容作为“值”的一部分，同时将文件名或路径作为键。

有一个古怪的图书馆。他们有一系列的输出格式实现。您想要的是MultipleLeafValueOutputFormat：写入键指定的文件，并且只写入值。

现在，假设您必须编写以下对，并且您的分隔符是制表符（'\t'）：<“key1”，“value1”>（您希望将其写入文件名1）<“key2”， value2">（您希望将其写入文件名 2）

因此，现在 reducer 的输出将转换为： <"filename1","key1\tvalue1"> <"filename2","key2\tvalue2">

另外，不要忘记上面定义的类应该作为输出类添加到作业中：

conf.setOutputFormat(MultipleLeafValueOutputFormat.class);

这里要注意的一件事是，您将需要使用旧mapred包而不是mapreduce包。但这应该不是问题。

java - 如何在 Hadoop 0.20 中使用 MultipleoutputFormai？

2 回答 2

Related

Reference