我在mapreduce中尝试这个java代码来计算wordcount,在reduce方法完成后,我想显示唯一出现最大次数的单词。
为此,我创建了一些名为 myoutput、mykey 和 completeSum 的类级别变量。
我正在用 close 方法写入这些数据,但最后我得到了意想不到的结果。
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
static int completeSum = -1;
static OutputCollector<Text, IntWritable> myoutput;
static Text mykey = new Text();
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
if (completeSum < sum) {
completeSum = sum;
myoutput = output;
mykey = key;
}
}
@Override
public void close() throws IOException {
// TODO Auto-generated method stub
super.close();
myoutput.collect(mykey, new IntWritable(completeSum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
// conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
输入文件数据
one
three three three
four four four four
six six six six six six six six six six six six six six six six six six
five five five five five
seven seven seven seven seven seven seven seven seven seven seven seven seven
结果应该是
six 18
但是我得到了这个结果
three 18
通过结果,我可以看到总和是正确的,但关键不是。