shell - How to invoke an oozie workflow via shell script and block/wait till workflow completion

Question

I have created a workflow using Oozie that is comprised of multiple action nodes and have been successfully able to run those via coordinator.

I want to invoke the Oozie workflow via a wrapper shell script.

The wrapper script should invoke the Oozie command, wait till the oozie job completes (success or error) and return back the Oozie success status code (0) or the error code of the failed oozie action node (if any node of the oozie workflow has failed).

From what I have seen so far, I know that as soon as I invoke the oozie command to run a workflow, the command exits with the job id getting printed on linux console, while the oozie job keeps running asynchronously in the backend.

I want my wrapper script to block till the oozie coordinator job completes and return back the success/error code.

Can you please let me know how/if I can achieve this using any of the oozie features?

I am using Oozie version 3.3.2 and bash shell in Linux.

Note: In case anyone is curious about why I need such a feature - the requirement is that my wrapper shell script should know how long an oozie job has been runnig, when an oozie job has completed, and accordingly return back the exit code so that the parent process that is calling the wrapper script knows whether the job completed successfully or not, and if errored out, raise an alert/ticket for the support team.

score 5 · Accepted Answer

您可以通过使用作业 id 然后开始一个循环并解析 oozie info 的输出来做到这一点。下面是相同的shell代码。

开始 oozie 工作

oozie_job_id=$(oozie job -oozie http://<oozie-server>/oozie -config job.properties -run );
echo $oozie_job_id;
sleep 30;

从输出中解析作业 ID。这里的job_id格式是“job:jobid”

job_id=$(echo $oozie_job_id | sed -n 's/job: \(.*\)/\1/p');
echo $job_id;

定期检查作业状态，是否正在运行

while [ true ]
do
   job_status=$(oozie job --oozie http://<oozie-server>/oozie -info $job_id | sed -n 's/Status\(.*\): \(.*\)/\2/p');
    if [ "$job_status" != "RUNNING" ];
    then
        echo "Job is completed with status $job_status";
        break;
    fi
    #this sleep depends on you job, please change the value accordingly
    echo "sleeping for 5 minutes";
    sleep 5m
done

这是执行此操作的基本方法，您可以根据使用情况对其进行修改。

score 3 · Accepted Answer

要将工作流定义上传到 HDFS，请使用以下命令：

hdfs dfs -copyFromLocal -f workflow.xml /user/hdfs/workflows/workflow.xml

要启动 Oozie 作业，您需要下面的这两个命令。请注意，将每个命令写在一行上。

JOB_ID=$(oozie job -oozie http://<oozie-server>/oozie -config job.properties -submit)

oozie job -oozie http://<oozie-server>/oozie -start ${JOB_ID#*:} -config job.properties

返回时您需要解析来自以下命令的结果，result = 0否则将失败。每次试验后只需循环睡眠 X 时间。

oozie job -oozie http://<oozie-server>/oozie -info ${JOB_ID#*:}

echo $? //shows whether command executed successfully or not

shell - How to invoke an oozie workflow via shell script and block/wait till workflow completion

2 回答 2

Related

Reference