hadoop - MapReduce 输出为 ArrayList

Question

如何在普通 java 项目中调用 map reduce 方法，是否可以将 reducer 输出作为 Arraylist / Hashmap 而不是平面文件返回，以及如何从 jboss appServer 访问 mapreduce 方法。

score 0 · Accepted Answer

这是一个使用 MultipleOutput 的示例程序

    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int total = 0;
          for (; values.hasNext();) {
            total += values.next().get();
            mos.getCollector("text", reporter).collect(key,
                    new IntWritable(total));
            mos.getCollector("seq", reporter).collect(key,
                    new IntWritable(total));
        }

    }

您需要在 configure 方法中创建一个 MultipleOutputs 实例。

    private MultipleOutputs mos;

    @Override
    public void configure(JobConf job) {

        mos = new MultipleOutputs(job);
    }

在您的驱动程序类中，您需要告诉您要使用哪些所有输入格式。下面将以文本和序列文件格式生成您的输出。

// Defines additional single text based output 'text' for the job
    MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
            Text.class, IntWritable.class);

    // Defines additional sequence-file based output 'sequence' for the job
    MultipleOutputs.addNamedOutput(conf, "seq",
            SequenceFileOutputFormat.class, Text.class, IntWritable.class);

但是根据我从您的问题中了解到的情况，您基本上想从您的代码中访问您的 mapreduce 输出。您可以使用 HDFS API 下载输出文件。但更好的做法是将您的数据放在 Hive 表中并使用 JDBC 访问。

hadoop - MapReduce 输出为 ArrayList

1 回答 1

Related

Reference