2

我正在使用 TestDFSIO 和 TeraSort 基准工具执行几个 Hadoop 测试。我基本上是在测试不同数量的数据节点,以评估处理能力和数据节点可扩展性的线性度。

在上面提到的过程中,我显然不得不多次重启所有 Hadoop 环境。每次我重新启动 Hadoop 时,都会删除所有 MapReduce 作业,并且作业计数器从“job_2013*_0001”重新开始。出于比较的原因,保持之前启动的所有 MapReduce 作业对我来说非常重要。所以,我的问题是:

¿ 如何避免 Hadoop 在重新启动后删除所有 MapReduce 作业历史记录?¿ Hadoop 环境重启后是否有一些属性可以控制作业删除?

谢谢!

4

1 回答 1

0

the MR job history logs are not deleted right way after you restart hadoop, the new job will be counted from *_0001 and only new jobs which are started after hadoop restart will be displayed on resource manager web portal though. In fact, there are 2 log related settings from yarn default:

# this is where you can find the MR job history logs
yarn.nodemanager.log-dirs = ${yarn.log.dir}/userlogs 

# this is how long the history logs will be retained
yarn.nodemanager.log.retain-seconds = 10800

and the default ${yarn.log.dir} is defined in $HADOOP_HONE/etc/hadoop/yarn-env.sh.

YARN_LOG_DIR="$HADOOP_YARN_HOME/logs"

BTW, similar settings could also be found in mapred-env.sh if you are use Hadoop 1.X

于 2013-11-25T05:30:37.637 回答