2

目标

在 yarn-cluster 模式下运行我们的 scala spark app jar。它适用于独立集群模式和 yarn-client,但由于某种原因,它无法在 yarn-cluster 模式下运行完成。

细节

它似乎执行的代码的最后一部分是在读取输入文件时将初始值分配给 Dataframe。看起来它在那之后没有做任何事情。没有任何日志看起来异常,也没有警告或错误。它突然被取消注册,状态为成功,一切都被杀死了。在任何其他部署模式(例如,yarn-client、独立集群模式)上,一切都可以顺利完成。

15/07/22 15:57:00 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED

我还在 vanilla spark/YARN 集群和 cdh 5.4.3 集群上的 spark 1.3.x 和 1.4.x 上运行了这项工作。所有结果都相同。可能是什么问题?

使用以下命令运行作业,并且可以通过 hdfs 访问输入文件。

bin/spark-submit --master yarn-cluster --class AssocApp ../associationRulesScala/target/scala-2.10/AssociationRule_2.10.4-1.0.0.SNAPSHOT.jar hdfs://sparkMaster-hk:9000/user/root/BreastCancer.csv

代码片段

这是加载数据框的区域中的代码。它吐出日志消息“Uploading Dataframe ...”,但之后没有其他内容。请参阅下面的驱动程序日志

//...
  logger.info("Uploading Dataframe from %s".format(filename))
  sparkParams.sqlContext.csvFile(filename)

  MDC.put("jobID",jobID.takeRight(3))
  logger.info("Extracting Unique Vals from each of %d columns...".format(frame.columns.length))
  private val uniqueVals = frame.columns.zipWithIndex.map(colname => (colname._2, colname._1, frame.select(colname._1).distinct.cache)).
//...

驱动程序日志

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-root/nm-local-dir/usercache/root/filecache/60/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/07/22 15:56:52 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
15/07/22 15:56:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1434116948302_0097_000001
15/07/22 15:56:55 INFO spark.SecurityManager: Changing view acls to: root
15/07/22 15:56:55 INFO spark.SecurityManager: Changing modify acls to: root
15/07/22 15:56:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/07/22 15:56:55 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
15/07/22 15:56:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization
15/07/22 15:56:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 
15/07/22 15:56:56 INFO AssocApp$: Starting new Association Rules calculation. From File: hdfs://sparkMaster-hk:9000/user/root/BreastCancer.csv
15/07/22 15:56:56 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
15/07/22 15:56:57 INFO associationRules.primaryPackageSpark: Uploading Dataframe from hdfs://sparkMaster-hk:9000/user/root/BreastCancer.csv 
15/07/22 15:56:57 INFO spark.SparkContext: Running Spark version 1.4.0
15/07/22 15:56:57 INFO spark.SecurityManager: Changing view acls to: root
15/07/22 15:56:57 INFO spark.SecurityManager: Changing modify acls to: root
15/07/22 15:56:57 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/07/22 15:56:57 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/07/22 15:56:57 INFO Remoting: Starting remoting
15/07/22 15:56:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@119.81.232.13:41459]
15/07/22 15:56:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 41459.
15/07/22 15:56:57 INFO spark.SparkEnv: Registering MapOutputTracker
15/07/22 15:56:57 INFO spark.SparkEnv: Registering BlockManagerMaster
15/07/22 15:56:57 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/blockmgr-f0e66040-1fdb-4a05-87e1-160194829f84
15/07/22 15:56:57 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
15/07/22 15:56:58 INFO spark.HttpFileServer: HTTP File server directory is /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/httpd-79b304a1-3cf4-4951-9e22-bbdfac435824
15/07/22 15:56:58 INFO spark.HttpServer: Starting HTTP Server
15/07/22 15:56:58 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/22 15:56:58 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:36021
15/07/22 15:56:58 INFO util.Utils: Successfully started service 'HTTP file server' on port 36021.
15/07/22 15:56:58 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/07/22 15:56:58 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/07/22 15:56:58 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/22 15:56:58 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:53274
15/07/22 15:56:58 INFO util.Utils: Successfully started service 'SparkUI' on port 53274.
15/07/22 15:56:58 INFO ui.SparkUI: Started SparkUI at http://119.XX.XXX.XX:53274
15/07/22 15:56:58 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
15/07/22 15:56:59 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34498.
15/07/22 15:56:59 INFO netty.NettyBlockTransferService: Server created on 34498
15/07/22 15:56:59 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/07/22 15:56:59 INFO storage.BlockManagerMasterEndpoint: Registering block manager 119.81.232.13:34498 with 267.3 MB RAM, BlockManagerId(driver, 119.81.232.13, 34498)
15/07/22 15:56:59 INFO storage.BlockManagerMaster: Registered BlockManager
15/07/22 15:56:59 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#-819146876])
15/07/22 15:56:59 INFO client.RMProxy: Connecting to ResourceManager at sparkMaster-hk/119.81.232.24:8030
15/07/22 15:56:59 INFO yarn.YarnRMClient: Registering the ApplicationMaster
15/07/22 15:57:00 INFO yarn.YarnAllocator: Will request 2 executor containers, each with 1 cores and 1408 MB memory including 384 MB overhead
15/07/22 15:57:00 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>)
15/07/22 15:57:00 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>)
15/07/22 15:57:00 INFO yarn.ApplicationMaster: Started progress reporter thread - sleep time : 5000
15/07/22 15:57:00 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
15/07/22 15:57:00 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
15/07/22 15:57:00 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1434116948302_0097
15/07/22 15:57:00 INFO storage.DiskBlockManager: Shutdown hook called
15/07/22 15:57:00 INFO util.Utils: Shutdown hook called
15/07/22 15:57:00 INFO util.Utils: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/httpd-79b304a1-3cf4-4951-9e22-bbdfac435824
15/07/22 15:57:00 INFO util.Utils: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/userFiles-e01b4dd2-681c-4108-aec6-879774652c7a
4

0 回答 0