hadoop - Hadoop 重新启动时删除 MapReduce 历史记录

Question

我正在使用 TestDFSIO 和 TeraSort 基准工具执行几个 Hadoop 测试。我基本上是在测试不同数量的数据节点，以评估处理能力和数据节点可扩展性的线性度。

在上面提到的过程中，我显然不得不多次重启所有 Hadoop 环境。每次我重新启动 Hadoop 时，都会删除所有 MapReduce 作业，并且作业计数器从“job_2013*_0001”重新开始。出于比较的原因，保持之前启动的所有 MapReduce 作业对我来说非常重要。所以，我的问题是：

¿ 如何避免 Hadoop 在重新启动后删除所有 MapReduce 作业历史记录？¿ Hadoop 环境重启后是否有一些属性可以控制作业删除？

谢谢！

score 0 · Accepted Answer

the MR job history logs are not deleted right way after you restart hadoop, the new job will be counted from *_0001 and only new jobs which are started after hadoop restart will be displayed on resource manager web portal though. In fact, there are 2 log related settings from yarn default:

# this is where you can find the MR job history logs
yarn.nodemanager.log-dirs = ${yarn.log.dir}/userlogs 

# this is how long the history logs will be retained
yarn.nodemanager.log.retain-seconds = 10800

and the default ${yarn.log.dir} is defined in $HADOOP_HONE/etc/hadoop/yarn-env.sh.

YARN_LOG_DIR="$HADOOP_YARN_HOME/logs"

BTW, similar settings could also be found in mapred-env.sh if you are use Hadoop 1.X

hadoop - Hadoop 重新启动时删除 MapReduce 历史记录

1 回答 1

Related

Reference