这是我的Map
public static class MapClass extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
String[] fields = value.toString().split(",", -20);
String country = fields[4];
String numClaims = fields[8];
if (numClaims.length() > 0 && !numClaims.startsWith("\"")) {
context.write(new Text(country), new Text(numClaims + ",1"));
}
}
}
这是我的Reduce
public void reduce(Text key, Iterator<Text> values, Context context) throws IOException, InterruptedException {
double sum = 0.0;
int count = 0;
while (values.hasNext()) {
String[] fields = values.next().toString().split(",");
sum += Double.parseDouble(fields[0]);
count += Integer.parseInt(fields[1]);
}
context.write(new Text(key), new DoubleWritable(sum/count));
}
这是它的配置方式
Job job = new Job(getConf());
job.setJarByClass(AverageByAttributeUsingCombiner.class);
job.setJobName("AverageByAttributeUsingCombiner");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MapClass.class);
// job.setCombinerClass(Combinber.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// job.setNumReduceTasks(0); // to not run the reducer
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
输入的形式
"PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL","ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD│
","SECDLWBD" │
3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,, │
3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,, │
3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,, │
3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,,
整个输出map reduce
看起来像
“AR” 5,1 │<br> “AR” 9,1 │<br> “AR” 2,1 │<br> “AR” 15,1 │<br> “AR” 13,1 │<br> “AR” 1,1 │<br> “AR” 34,1 │<br> “AR” 12,1 │<br> “AR” 8,1 │<br> “AR” 7,1 │<br> “AR” 23,1 │<br> “AR” 3,1 │<br> “AR” 4,1 │<br>“AR”4,1
如何调试和解决此问题?我正在学习hadoop