eclipse - Hadoop Map Reduce 程序

Question

当我尝试基于 Hadoop 0.20 API 的 Hadoop in Action 书中的 Map Reduce 编程示例时，我得到了错误

java.io.IOException：映射中的值类型不匹配：预期 org.apache.hadoop.io.IntWritable，收到 org.apache.hadoop.io.Text

但据我检查，我正在正确通过所有内容。如果有人可以帮助我，那将非常有帮助。

这是代码。它与书中的代码相同。

@SuppressWarnings("unused")
public class CountPatents extends Configured implements Tool {
    @SuppressWarnings("deprecation")

    public static class MapClass extends MapReduceBase implements Mapper<Text, Text, Text, Text> {
        public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException {
            output.collect(value, key);
        }
    }
public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> {
    public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        int count=0;
        while(values.hasNext()){
            count=count+1;

            values.next();

        }


        output.collect(key, new IntWritable(count));
    }
}


    public int run(String[] args) throws Exception {

    Configuration conf = getConf();
    JobConf job = new JobConf(conf, CountPatents.class);
    Path in = new Path(args[0]);
    Path out = new Path(args[1]);
    FileInputFormat.setInputPaths(job, in);
    FileOutputFormat.setOutputPath(job, out);
    job.setJobName("MyJob");
    job.setMapperClass(MapClass.class);
    job.setReducerClass(Reduce.class);
    job.setInputFormat(KeyValueTextInputFormat.class);
    job.setOutputFormat(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    job.set("key.value.separator.in.input.line", ",");
    JobClient.runJob(job);
    return 0;
    }
    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new CountPatents(), args);
        System.exit(res);


    }

    }

score 8 · Accepted Answer

快速查看（不是在本地运行代码），看起来您在设置时将作业的输出设置为 Text 类型job.setOutputValueClass(Text.class);，但 reducer 上的输出类型设置为 IntWritable。这很可能是错误。

score 0 · Accepted Answer

地图发出 <Text,Text>

所以设置

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);

setMapOutputKeyClass setMapOutputValueClass

score 0 · Accepted Answer

未接电话：

job.setMapOutputValueClass(IntWritable.class);

使用新的 0.20 接口和新的“Job”对象代替 JobConf 时出现同样的问题。

score 0 · Accepted Answer

减速器的输出中应该有错误：

您的resuce类定义如下：

公共静态类 Reduce 扩展 MapReduceBase 实现 Reducer

所以输出值应该是 IntWritable 类型。

但是，您提到了 job.setOutputValueClass(Text.class);

所以根据配置，reducer 的输出应该是 Text。

解决方案：在配置中，添加以下行 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class);

并修改：job.setOutputValueClass(IntWritable.class);

然后尝试运行

score 0 · Accepted Answer

在您的 reducer 函数中，您使用的是 OutputCollector，这意味着输出键类将是 Text 类型，而输出值类将是 IntWritable 类型。但是在主（运行）函数中，您设置了 job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class);。

将 job.setOutputValueClass(Text.class) 更改为 job.setOutputValueClass(IntWritable.class) 就可以了！

此外，最好设置 MapperOutputKeyType 和 MapperOutputValueType 以避免任何差异。Hadoop 使用基于 Writable 接口的机制而不是原生的 Java 序列化机制。与Java序列化机制不同的是，该方法没有将类名封装在序列化实体中。因此，需要显式类名才能将这些类从 Mapper 实例化到 Reducer，因为在不知道类被反序列化为（Reducer 输入键和值实例）的情况下，无法反序列化表示 Writable 实例的字节数组。需要通过在 Job 实例上调用 setMapOutputKeyClass 和 setMapOutputValueClass 显式提供此信息

score -1 · Accepted Answer

公共静态类 MapClass 扩展 MapReduceBase 实现 Mapper<Text, Text, Text, Text> { public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter report) throws IOException { output.collect(value, key) ; } } 公共静态类 Reduce 扩展 MapReduceBase 实现 Reducer<Text, Text, Text, IntWritable> { public void reduce(Text key, Iterator values, OutputCollector<Text, IntWritable> output, Reporter report) throws IOException { int count=0; while(values.hasNext()){ count=count+1;

        values.next();

    }


    output.collect(key, new IntWritable(count));
}

}

public int run(String[] args) throws Exception {

Configuration conf = getConf();
JobConf job = new JobConf(conf, CountPatents.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("MyJob");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(KeyValueTextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.set("key.value.separator.in.input.line", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new Configuration(), new CountPatents(), args);
    System.exit(res);


}

}

eclipse - Hadoop Map Reduce 程序

6 回答 6

Related

Reference