I'm trying to run a spark-scala Self-Contained App in Oozie. Please note that I'm using CDH5.13 Quickstart VM with 20G of RAM (containing Cloudera Manager, HUE ..., and I uppgraded Java from 7 to 8).
The code does pretty much nothing, it just create HiveContext and then create a Hive table:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
object ThirdApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Third Application")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
import sqlContext.implicits._
sqlContext.sql("CREATE TABLE IF NOT EXISTS default.src (key INT, value STRING)")
}
}
sbt file:
name := "Third Project"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.6.0",
"org.apache.spark" %% "spark-hive" % "1.6.0")
The App worked fine when I submited it (in shell), and the Hive table was created. But when I runned the same App in oozie it gives Memory issues.
Please note that i'm used to run spark apps in oozie and they work fine except for this use case that contains hiveContext.
Here is the workflow.xml:
<workflow-app name="spark-scala" xmlns="uri:oozie:workflow:0.5">
<start to="spark-5a6a"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="spark-5a6a">
<spark xmlns="uri:oozie:spark-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>local</master>
<mode>client</mode>
<name>MySpark</name>
<class>ThirdApp</class>
<jar>third-project_2.10-1.0.jar</jar>
<file>/user/cloudera/oozie-spark/third-project_2.10-1.0.jar#third-project_2.10-1.0.jar</file>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
Here is the job.properties:
oozie.use.system.libpath=True
send_email=False
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=quickstart.cloudera:8032
security_enabled=False
Kindly be informed that I added spark superuser group from Cloudera Manager > Category > Security > Superuser group , to avoid Permission issues:
Adding spark to superuser group (Cloudera Manager View)
stdout logs:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), PermGen space
ERROR org.apache.hadoop.mapred.YarnChild - Error running child : java.lang.OutOfMemoryError: PermGen space
WARN org.apache.hadoop.ipc.Client - Unexpected error reading responses on connection Thread[IPC Client (1722336150) connection to /127.0.0.1:59738 from job_1547905343759_0002,5,main]
java.lang.OutOfMemoryError: PermGen space
INFO org.apache.hadoop.mapred.Task - Communication exception: java.io.IOException: The client is stopped
ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler - Thread Thread[main,5,main] threw an Error.
stderr logs:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), PermGen space
Halting due to Out Of Memory Error...
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
syslog:
INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
NFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1547905343759_0002, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@3a06520)
INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 127.0.0.1:8032, Ident: (RM_DELEGATION_TOKEN owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1547907649379, maxDate=1548512449379, sequenceNumber=6, masterKeyId=2)
INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /yarn/nm/usercache/cloudera/appcache/application_1547905343759_0002
INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.oozie.action.hadoop.OozieLauncherInputFormat$EmptySplit@1ab7aa29
NFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
NFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
And I also looked for logs in Cloudera Manager > Logs > ERROR:
Exception in doCheckpoint
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): NameNode still not started
...(more)
Error starting JobHistoryServer
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://quickstart.cloudera:8020/user/history/done]
...
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): NameNode still not started
...(more)
SERVER[quickstart.cloudera] USER[-] GROUP[-] TOKEN[] APP[-] JOB[0000001-190120120522295-oozie-oozi-W] ACTION[0000001-190120120522295-oozie-oozi-W@spark-5a6a] XException,
org.apache.oozie.command.CommandException: E0800: Action it is not running its in [KILLED] state, action [0000001-190120120522295-oozie-oozi-W@spark-5a6a]
at org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:92)
at org.apache.oozie.command.XCommand.call(XCommand.java:257)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
getting attribute DatanodeNetworkCounts of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342)
...More
Here is an (approximate) full view of the logs:
/var/log/hadoop-hdfs/...log.out
I've tried to fix these problems by:
Increasing memory for map/reduce in mapred-site.xml:
<property>
<name>mapreduce.map.memory.mb</name>
<value>2128</value>
</property>
<property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2128</value>
</property>
<property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>2128</value>
</property>
Global View of mapred-site.xml
I tried also to increase Java Heap: View of Java Heap in Cloudera Manager
I tried also to set Gateway Default Group: View of Client Java Configuration Options
And I've tried to add Options list in the workflow that says: --driver-memory 5G
But it's always giving the same error. Could you please Help!