0

I'm trying to run a spark-scala Self-Contained App in Oozie. Please note that I'm using CDH5.13 Quickstart VM with 20G of RAM (containing Cloudera Manager, HUE ..., and I uppgraded Java from 7 to 8).

The code does pretty much nothing, it just create HiveContext and then create a Hive table:

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD

object ThirdApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Third Application")
val sc = new SparkContext(conf)

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
import sqlContext.implicits._
sqlContext.sql("CREATE TABLE IF NOT EXISTS default.src (key INT, value STRING)")
}
}

sbt file:

name := "Third Project"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.6.0",
 "org.apache.spark" %% "spark-hive"  % "1.6.0")

The App worked fine when I submited it (in shell), and the Hive table was created. But when I runned the same App in oozie it gives Memory issues.

Please note that i'm used to run spark apps in oozie and they work fine except for this use case that contains hiveContext.

Here is the workflow.xml:

<workflow-app name="spark-scala" xmlns="uri:oozie:workflow:0.5">
    <start to="spark-5a6a"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="spark-5a6a">
        <spark xmlns="uri:oozie:spark-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>local</master>
            <mode>client</mode>
            <name>MySpark</name>
              <class>ThirdApp</class>
            <jar>third-project_2.10-1.0.jar</jar>
            <file>/user/cloudera/oozie-spark/third-project_2.10-1.0.jar#third-project_2.10-1.0.jar</file>
        </spark>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

Here is the job.properties:

oozie.use.system.libpath=True
send_email=False
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=quickstart.cloudera:8032
security_enabled=False

Kindly be informed that I added spark superuser group from Cloudera Manager > Category > Security > Superuser group , to avoid Permission issues:

Adding spark to superuser group (Cloudera Manager View)

hive-site.xml view

stdout logs:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), PermGen space

ERROR org.apache.hadoop.mapred.YarnChild  - Error running child : java.lang.OutOfMemoryError: PermGen space

WARN  org.apache.hadoop.ipc.Client  - Unexpected error reading responses on connection Thread[IPC Client (1722336150) connection to /127.0.0.1:59738 from job_1547905343759_0002,5,main]

java.lang.OutOfMemoryError: PermGen space

INFO  org.apache.hadoop.mapred.Task  - Communication exception: java.io.IOException: The client is stopped

ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler  - Thread Thread[main,5,main] threw an Error.

stderr logs:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), PermGen space
Halting due to Out Of Memory Error...

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"

syslog:

INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
NFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1547905343759_0002, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@3a06520)
INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 127.0.0.1:8032, Ident: (RM_DELEGATION_TOKEN owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1547907649379, maxDate=1548512449379, sequenceNumber=6, masterKeyId=2)
INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /yarn/nm/usercache/cloudera/appcache/application_1547905343759_0002
INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.oozie.action.hadoop.OozieLauncherInputFormat$EmptySplit@1ab7aa29
NFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
NFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032

And I also looked for logs in Cloudera Manager > Logs > ERROR:

Exception in doCheckpoint
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): NameNode still not started
...(more)

Error starting JobHistoryServer
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://quickstart.cloudera:8020/user/history/done]
...
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): NameNode still not started
...(more)

SERVER[quickstart.cloudera] USER[-] GROUP[-] TOKEN[] APP[-] JOB[0000001-190120120522295-oozie-oozi-W] ACTION[0000001-190120120522295-oozie-oozi-W@spark-5a6a] XException, 
org.apache.oozie.command.CommandException: E0800: Action it is not running its in [KILLED] state, action [0000001-190120120522295-oozie-oozi-W@spark-5a6a]
    at org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:92)
    at org.apache.oozie.command.XCommand.call(XCommand.java:257)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

getting attribute DatanodeNetworkCounts of Hadoop:service=DataNode,name=DataNodeInfo threw an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
    at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
    at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:342)
...More

Here is an (approximate) full view of the logs:

/var/log/spark/...log

/var/log/hadoop-hdfs/...log.out

I've tried to fix these problems by:

Increasing memory for map/reduce in mapred-site.xml:

  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>2128</value>
  </property>
  <property>
  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>2128</value>
  </property>
  <property>
  <property>
    <name>yarn.app.mapreduce.am.resource.mb</name>
    <value>2128</value>
  </property>

Global View of mapred-site.xml

I tried also to increase Java Heap: View of Java Heap in Cloudera Manager

I tried also to set Gateway Default Group: View of Client Java Configuration Options

And I've tried to add Options list in the workflow that says: --driver-memory 5G

But it's always giving the same error. Could you please Help!

4

1 回答 1

0

我不确定内存问题 - 但我已经看到“权限被拒绝”问题由于某些原因,文件夹'/user/spark/applicationHistory/local-1547821006998'归用户'cloudera'所有,而不是spark,所以spark 无法写入。要解决它,请登录到 VM 并将组 supergroup 添加到用户 spark:“usermod -G supergroup saprk”欢呼,Doron

于 2019-01-19T18:52:08.567 回答