When submitting a workflow job for the workflow definition above, 3 workflow job properties must be specified:
jobTracker:
inputDir:
outputDir:
我有一个 PySpark 脚本,它在脚本本身中指定了输入和输出位置。在我的工作流 XML 中,我不需要也不想要一个inputDir
and 。outputDir
通过 Oozie 运行我的 PySpark 脚本时,我收到此错误消息。
WARN ParameterVerifier:523 - SERVER[<my_server>] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] The application does not define formal parameters in its XML definition
WARN JobResourceUploader:64 - SERVER[<my_server>] Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2018-05-24 11:52:29,844 WARN JobResourceUploader:171 - SERVER[<my_server>] No job jar file set. User classes may not be found. See Job or Job#setJar(String).
基于https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/util/ParameterVerifier.java,我的第一个警告是由于我没有一个“输入目录”
else {
// Log a warning when the <parameters> section is missing
XLog.getLog(ParameterVerifier.class).warn("The application does not define formal parameters in its XML "
+ "definition");
}
我可以解决这个问题吗?
更新——我的 XML 结构
<action name="spark-node">
<spark xmlns="uri:oozie:spark-action:0.1" >
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${inputDir}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
</configuration>
<master>yarn-master</master>
<!-- <mode>client</mode> -->
<name>oozie_test</name>
<jar>oozie_test.py</jar>
<spark-opts>--num-executors 1 --executor-memory 10G --executor-cores 1 --driver-memory 1G</spark-opts>
</spark>
<ok to="end" />
<error to="fail" />
</action>