0

I'm trying to wrap my head about the actual purpose of the new API, and reading over the internet, I have found different answers to the same questions I was dealing with.

The questions I'd like to know the answers to are:

1) Which of the MRv2/YARN daemons is the one responsible for launching application containers and monitoring application resource usage.

2) Which two issues MRv2/YARN is designed to address?

I'll try to make this thread educational and constructive to other readers by specifying resources and actual data from my searches, so I hope it wouldn't look like I have provided too much information while I could just ask the questions and make my post shorter.

For the 1st question, reading in the documentation, I could find 3 main resources to rely on:

From Hadoop documentation:

ApplicationMaster<-->NodeManager Launch containers. Communicate with NodeManagers by using NMClientAsync objects, handling container events by NMClientAsync.CallbackHandler

The ApplicationMaster communicates with YARN cluster, and handles application execution. It performs operations in an asynchronous fashion. During application launch time, the main tasks of the ApplicationMaster are:

a) communicating with the ResourceManager to negotiate and allocate resources for future containers, and

b) after container allocation, communicating YARN NodeManagers (NMs) to launch application containers on them.

From Hortonworks documentation

The ApplicationMaster is, in effect, an instance of a framework-specific library and is responsible for negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the containers and their resource consumption. It has the responsibility of negotiating appropriate resource containers from the ResourceManager, tracking their status and monitoring progress.

From Cloudera documentation:

MRv2 daemons -

ResourceManager – one per cluster – Starts ApplicationMasters, allocates resources on slave nodes

ApplicationMaster – one per job – Requests resources, manages individual Map and Reduce tasks

NodeManager – one per slave node – Manages resources on individual slave nodes

JobHistory – one per cluster – Archives jobs’ metrics and metadata

Back to the question (which daemons is the one responsible for launching application containers and monitoring application resource usage) I ask myself:

Is it the NodeManager? Is it the ApplicationMaster?

From what I understand, the ApplicationMaster is the one who makes the NodeManager to actually get the job done, so it is like asking who's responsible for lifting a box from the ground, were those the hands who did the actual lifting of the mind who controls the body and makes them do the lifting...

It is a tricky question, I guess, but there has to be only one answer to it.

For the 2nd question, reading online, I could find different answers from many resources and thus the confusion, but my main sources would be:

From Cloudera documentation:

MapReduce v2 (“MRv2”) – Built on top of YARN (Yet"Another Resource NegoGator)

– Uses ResourceManager/NodeManager architecture

– Increases scalability of cluster

– Node resources can be used for any type of task

– Improves cluster utilization

– Support for non/MR jobs

Back to the question (Which two issues MRv2/YARN is designed to address?), I know MRv2 made a few changes like prevent resource pressure on the JobTracker (in MRv1, maximum number of nodes in the cluster could be around 4000, and in MRv2 it is more than 2 times this number), and I also know it provides the ability to run frameworks other than MapReduce, such as MPI.

From documentation:

The Application Master provides much of the functionality of the traditional ResourceManager so that the entire system can scale more dramatically. In tests, we’ve already successfully simulated 10,000 node clusters composed of modern hardware without significant issue.

and:

Moving all application framework specific code into the ApplicationMaster generalizes the system so that we can now support multiple frameworks such as MapReduce, MPI and Graph Processing.

But I also think it dealt with the fact that the NameNode was a Single point of failure, and in the new version there's the Standby NameNode via the high availability mode (I might be confusing features of the old vs. new API, with features of MRv1 vs. MRv2 and that might be the cause for my question):

Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.

So if you would have to choose 2 of the 3, which ones would be the 2 that serve as the two issues MRv2/YARN is designed to address?

-Resource pressure on the JobTracker

-Ability to run frameworks other than MapReduce, such as MPI.

-Single point of failure in the NameNode.

Thank you in advance! D

4

3 回答 3

5

哪个 MRv2/YARN 守护进程负责启动应用程序容器和监控应用程序资源使用情况。

ResourceManager(RM) 负责为特定作业启动ApplicationMaster(AM) 一次,AM 已启动其AM 负责协商、分配和监控作业资源(容器)。

我建议您阅读Hadoop Definitive Guide Ch6中的 MapReduce Job Anatomy of MapReduce Job,以深入了解如何在 MR1 和 MR2 中分配 Job 资源。

MRv2/YARN 旨在解决哪两个问题?

YARN 尝试将 MR1 中 JobTracker 的功能(这是扩展的瓶颈)分离到自己的抽象中:

  • 集群资源管理 - 资源管理器
  • 应用程序生命周期管理 - 特定应用程序/作业的应用程序大师

因此,如果您必须从 3 个中选择 2 个,那么 2 个将作为 MRv2/YARN 旨在解决的两个问题?

- JobTracker 的资源压力

- 能够运行 MapReduce 以外的框架,例如 MPI。

- NameNode 中的单点故障。

从您的 3 个答案中的 2 个中,我会选择 1 和 2。

于 2015-01-13T05:48:58.317 回答
1

根据 cloudera http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_mapreduce_to_yarn_migrate.html#concept_z1p_gmy_xl_unique_2

TaskTracker 已被 NodeManager 取代,这是一种 YARN 服务,用于管理主机上的资源和部署。它负责启动容器,每个容器都可以容纳一个 map 或 reduce 任务。

所以它的 NodeManager 为 mapred 任务启动容器。

ApplicationMaster 容器是由 ResourceManager 启动的。

于 2015-08-26T10:20:57.990 回答
0

只是为了澄清上面的“ApplicationMaster 容器是由 ResourceManager 启动的”是指 -- ResourceManager 指示 NodeManager 启动 Application Master 容器。ApplicationMaster Container 的实际启动也是由 NodeManager 完成的

于 2015-08-26T10:28:44.797 回答