hadoop - HADOOP - Mapreduce - 我为所有键获得相同的值

Question

我对 mapreduce 有疑问。给出一个歌曲列表（“Songname”#“UserID”#“boolean”）作为输入，我必须有一个歌曲列表，其中指定了不同用户听他们的次数......所以一个''输出（“Songname “，”时间收听“）。我使用 hashtable 只允许一对。对于短文件，它运行良好，但是当我输入一个大约 1000000 条记录的列表时，它会为所有记录返回相同的值 (20)。

这是我的映射器：

    public static class CanzoniMapper extends Mapper<Object, Text, Text, IntWritable>{

    private IntWritable userID = new IntWritable(0);
    private Text song = new Text();

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String[] caratteri = value.toString().split("#");
        if(caratteri[2].equals("1")){
            song.set(caratteri[0]);
            userID.set(Integer.parseInt(caratteri[1]));
            context.write(song,userID);
        }
    }
  }

这是我的减速机：

public static class CanzoniReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
      Hashtable<IntWritable,Text> doppioni = new Hashtable<IntWritable,Text>();
      for (IntWritable val : values) {
        doppioni.put(val,key);
      }
      result.set(doppioni.size());
      doppioni.clear();
      context.write(key,result);
    }
  }

主要：

Configuration conf = new Configuration();

    Job job = new Job(conf, "word count");
    job.setJarByClass(Canzoni.class);
    job.setMapperClass(CanzoniMapper.class);
    //job.setCombinerClass(CanzoniReducer.class);
    //job.setNumReduceTasks(2);
    job.setReducerClass(CanzoniReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);

任何想法？？？

score 0 · Accepted Answer

也许我解决了它。是输入问题。与歌曲数量相比，记录太多了，因此在这些记录列表中，每个用户至少列出了每首歌曲。在我的测试中，我有 20 个不同的用户，所以结果自然而然地给了我每首歌 20 个。我必须增加不同歌曲的数量。

hadoop - HADOOP - Mapreduce - 我为所有键获得相同的值

1 回答 1

Related

Reference