I'm having a issue getting started with my first map-reduce code on Hadoop. I copied the following code from "Hadoop: The definitive guide" but I'm not able to run it on my single node Hadoop installation.
My Code snippet:
Main:
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
Mapper:
public void map(LongWritable key, Text value, Context context)
Reducer:
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
Implementations of map and reduce function are also picked from the book only. But when I try to execute this code, this is the error I get:
INFO mapred.JobClient: Task Id : attempt_201304021022_0016_m_000000_0, Status : FAILED
java.lang.ClassCastException: interface javax.xml.soap.Text
at java.lang.Class.asSubclass(Class.java:3027)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:774)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:959)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Answers to similar questions in the past (Hadoop type mismatch in key from map expected value Text received value LongWritable) helped me to figure out that InputFormatClass should match the input to the map function. So I also tried using job.setInputFormatClass(TextInputFormat.class); in my main method, but it also did not solve the issue. What could be the issue here?
Here is the implementation of the Mapper class
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(45) == '+') { // parseInt doesn't like leading plus signs
airTemperature = Integer.parseInt(line.substring(46, 50));
} else {
airTemperature = Integer.parseInt(line.substring(45, 50));
}
String quality = line.substring(50, 51);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}