2

我是 hadoop 新手,我写了一些工作并将它们导出为 jar 文件。我可以使用 hadoop jar 命令运行它们,我想每隔一小时运行一次这些作业。我该怎么做呢?提前致谢。

4

4 回答 4

3

Hadoop itself doesn't have ways to schedule jobs like you are suggesting. So you have two main choices, Java's Time and scheduling functions, or run the jobs from the operating system, I would suggest Cron. I would personally use cron to do this, it's simple and very flexible, and is installed by default on most servers. There are also lots of tutorials.

Cron example to run on the first minute of every hour.

0 * * * *  /bin/hadoop jar myJar.jar

If you want to keep it inside of java itself, I would suggest checking out this question which has details and code, How to schedule task for start of every hour.

于 2013-05-06T18:21:18.380 回答
3

您可能可以通过编写 cron 或一些脚本来实现这一点。但在我看来,更好的方法是使用像Oozie这样的调度程序。

于 2013-05-06T18:21:54.303 回答
0

为 CRON 和 Oozie 添加另一个选项,Quartz 调度程序

于 2017-09-13T21:28:49.923 回答
0

除了已经提到的 Oozie,您可能还想看看Falcon

然而,根据自己的经验,一个更简单的方法是尝试使用您的 CI 系统来避免将新系统添加到您的堆栈中,例如Jenkins

于 2015-09-08T15:15:16.333 回答