java - 使用（java编程）在hadoop中找到最大的整数值

Question

我最近开始在hadoop工作，我刚刚学习了一些关于它的基本理论知识。我正在尝试解决应在文本文件中给出输入的任务，例如 input.txt (1 10 37 5 4 98 100 等)

我需要在给定的输入中找到最大的整数（即整数类型）。我正在尝试传递数组列表中的输入，以便我可以将第一个整数与所有整数的其余部分进行比较（使用 for 循环）。

1）是否有可能以这种方式找到解决方案？如果是，我无法在 hadoop 中创建数组列表，需要一些提示:-)

2）我们可以只打印“键”而不是键值对吗？如果是这样，请帮助我。我试图在 reduce 函数中编写代码以不打印它，但我遇到了一些错误。

请指导我一些提示，我可以通过这些提示继续前进。谢谢

score 0 · Accepted Answer

在您的映射步骤中，您可以将所有数字映射到一个键。然后在你的减少步骤中，你可以取最大值。reduce 步骤将传递给定键的可迭代值集合——无需创建自己的 ArrayList。

score 0 · Accepted Answer

为此，您最好有一个减速器。

为了确保所有数字都到达同一个减速器，你必须做两件事：

为映射器中的所有输入值发出相同的键
将减少任务设置为零。

您map()的方法可能如下所示：

@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
          context.write(new Text("MyAwesomeKey"), key); // assuming that your number is being read in the key
           }

在您的Reduce班级中，有一个 property max，例如： Long max

该reduce()方法可能如下所示：

@Override
public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
          context.write(new Text("MyAwesomeKey"), key); // assuming that your number is being read in the key
           }

然后在run()我们覆盖时也覆盖reduce()：

 public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKey()) {
      reduce(context.getCurrentKey(), context.getValues(), context);
    }
    context.write(new LongWritable(max),new Text("")); // write the max value
    cleanup(context);
  }

要将reduce任务设置为一个，请在您的工作中执行以下操作run()，请注意这与上述不同run()：

job.setNumReduceTasks(1);

注意：以上所有代码都遵循新的mapreduce API，我相信使用旧的mapred API 在 reducer 完成工作后，我们将无法拥有单点钩子，因为我们可以通过覆盖run()Reducer 来做到这一点。

java - 使用（java编程）在hadoop中找到最大的整数值

2 回答 2

Related

Reference