1

我正在使用org.apache.spark.deploy.yarn.Client(Spark 2.1.0)提交火花纱线应用程序(SparkPi 示例)。以下是相关的行:

    List<String> arguments = Lists.newArrayList("--class", "org.apache.spark.examples.SparkPi","--jar", "path/to/spark examples jar", "--arg", "10");

    SparkConf sparkConf = new SparkConf();
    applicationTag = "TestApp-" + new Date().getTime();
    sparkConf.set("spark.yarn.submit.waitAppCompletion", "false");
    sparkConf.set("spark.yarn.tags", applicationTag);
    sparkConf.set("spark.submit.deployMode", "cluster");
    sparkConf.set("spark.yarn.jars", "/opt/spark/jars/*.jar");

    System.setProperty("SPARK_YARN_MODE", "true");
    System.setProperty("SPARK_HOME", "/opt/spark");

    ClientArguments cArgs = new ClientArguments(arguments.toArray(new String[arguments.size()]));
    Client client = new Client(cArgs, sparkConf);
    client.run();

这似乎正在工作,并且 Spark 应用程序出现在 YARN RM UI 中并成功。但是,容器日志显示暂存目录的 URL 被拾取为 SPARK_YARN_STAGING_DIR -> file:/home/{current user}/.sparkStaging/application_xxxxxx. 浏览org.apache.spark.deploy.yarn.Client显示它的可能原因是暂存目录的基本路径未正确拾取。当登台目录被清除时,基本路径应该hdfs://localhost:9000/user/{current user}/不是file:/home/{current user}/由日志中出现的以下错误所确认的:

java.lang.IllegalArgumentException: Wrong FS: file:/home/user/.sparkStaging/application_1496908076154_0022, expected: hdfs://127.0.0.1:9000
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$cleanupStagingDir(ApplicationMaster.scala:545)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:233)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

当使用 spark-submit 时,这一切都很好,因为我相信它正确设置了所有必需的环境变量。

我也尝试过设置sparkConf.set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/{current user}");但无济于事,因为它会导致其他一些错误,例如 hdfs 未被识别为有效文件系统。

4

0 回答 0