我是地图Reduce和hadoop概念的新手。所以请帮忙
我有近 100 个包含这种格式数据的文件
conf/iceis/GochenouerT01a:::John E. Gochenouer::Michael L. Tyler:::Voyeurism, Exhibitionism, and Privacy on the Internet.
我应该通过 map reduce 算法来做。现在在我想显示的输出中
John E. Gochenoue Voyeurism .
John E. Gochenoue Exhibitionism
John E. Gochenoue and
John E. Gochenoue privacy
John E. Gochenoue on
John E. Gochenoue the
John E. Gochenoue internet
Michael L. Tyler Voyeurism .
Michael L. Tyler Exhibitionism
Michael L. Tyler and
Michael L. Tyler privacy
Michael L. Tyler on
Michael L. Tyler the
Michael L. Tyler internet
所以现在它是单行。所以有'n'行这样的行包含大量的名字和大量的书籍。
因此,如果我考虑一份 110 行的文档。我可以像这样输出我的映射器吗
John E. Gochenoue Voyeurism 1
John E. Gochenoue Exhibitionism 3
Michael L. Tyler on 7
IE 要说它显示名称和工作,然后显示文档中单词的出现次数,最后在减少后它应该显示名称,然后显示名称对它的单词以及它在 ' 中出现的单词的组合频率n'文件。
我知道 output.collector() 但它需要两个参数
output.collect(arg0, arg1)
有什么方法可以收集三个值,如名称,单词和单词的出现
以下是我的代码
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
/*
* StringTokenizer tokenizer = new StringTokenizer(line); while
* (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken());
* output.collect(word, one);
*/
String strToSplit[] = line.split(":::");
String end = strToSplit[strToSplit.length - 1];
String[] names = strToSplit[1].split("::");
for (String name : names) {
StringTokenizer tokens = new StringTokenizer(end, " ");
while (tokens.hasMoreElements()) {
output.collect(arg0, arg1)
System.out.println(tokens.nextElement());
}
}
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(example.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, "/home/vishal/workspace/hw3data");
FileOutputFormat.setOutputPath(conf,
new Path("/home/vishal/nmnmnmnmnm"));
JobClient.runJob(conf);
}