hadoop - 如何从hadoop获取输出数据？

Question

我创建了运行 mapReduce 并在某个目录中生成输出的 jar。我需要从我的 java 代码的输出目录中读取数据，该代码不在 hadoop 环境中运行，而无需将其复制到本地目录中。我正在使用 ProcessBuilder 运行 Jar。任何人都可以帮助我..？？

score 1 · Accepted Answer

您可以编写以下代码来读取 MR 驱动程序代码中的作业输出。

    job.waitForCompletion(true);
    FileSystem fs = FileSystem.get(conf);
    Path[] outputFiles = FileUtil.stat2Paths(fs.listStatus(output,new  OutputFilesFilter()));

        for (Path file : outputFiles ) {
            InputStream is = fs.open(file);
            BufferedReader reader = new BufferedReader(new InputStreamReader(is));
            ---
            ---
        }

score 1 · Accepted Answer

使用 HDFS API 读取 HDFS 数据有什么问题？

public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream inputStream = fs.open(new Path("/mapout/input.txt"));
        System.out.println(inputStream.readLine());     
    }

您的程序可能用尽了您的 hadoop 集群，但 hadoop 守护进程必须正在运行。

hadoop - 如何从hadoop获取输出数据？

2 回答 2

Related

Reference