hadoop - hadoop，如何在尝试运行 mapred 作业时包含 3part jar

Question

众所周知，new 需要将所有需要的类打包到 job-jar 中并将其上传到服务器。它太慢了，我想知道是否有一种方法可以指定第三方 jar 包括执行 map-red 作业，这样我就只能打包没有依赖关系的类。

PS（我发现有一个“-libjar”命令，但我不知道如何使用它。这是链接http://blog.cloudera.com/blog/2011/01/how-to-include -third-party-libraries-in-your-map-reduce-job/ )

score 3 · Accepted Answer

这些被称为通用选项。因此，为了支持这些，您的工作应该实施 Tool。

像这样运行你的工作 -

hadoop jar yourfile.jar [mainClass] args -libjars <comma seperated list of jars>

编辑：

要实现Tool和扩展Configured，您可以在 MapReduce 应用程序中执行类似的操作——

public class YourClass extends Configured implements Tool {

      public static void main(String[] args) throws Exception {
         int res = ToolRunner.run(new YourClass(), args);
         System.exit(res);
      }

      public int run(String[] args) throws Exception
      {
        //parse you normal arguments here.

        Configuration conf = getConf();
        Job job = new Job(conf, "Name of job");

        //set the class names etc

        //set the output data type classes etc

        //to accept the hdfs input and outpur dir at run time
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        return job.waitForCompletion(true) ? 0 : 1;
    }
}

score 0 · Accepted Answer

对我来说，我必须在参数之前指定 -libjar 选项。否则，它被认为是一个论点。

hadoop - hadoop，如何在尝试运行 mapred 作业时包含 3part jar

2 回答 2

Related

Reference