hadoop - Oozie 作业卡在 Running 状态

Question

我有一个简单的作业工作流程，它将 mapreduce 作业作为 shell 操作执行。提交作业后，其状态变为正在运行，并一直停留在那里，但永远不会结束。mapreduce 集群显示有两个作业正在运行，一个属于 shell 应用程序启动器，另一个用于实际的 mapreduce 作业。然而，mapreduce 作业的一个显示为 UNASSIGNED 并且进度为零（这意味着它已经开始了）。

有趣的是，当我终止 oozie 作业时，mapreduce 作业实际上开始运行并成功完成。看起来 shell 启动器正在阻止它。

ps 这是一个简单的工作流程，没有可能导致它等待的开始或结束日期。

score 0 · Accepted Answer

当作业卡在“UNASSIGNED”状态时，通常意味着资源管理器（RM）无法为作业分配容器。检查用户和队列的容量配置。给他们更多的能力应该会有所帮助。

对于 Hadoop 2.7 和容量调度程序，具体来说，需要检查以下属性：

yarn.scheduler.capacity.<queue-path>.capacity
yarn.scheduler.capacity.<queue-path>.user-limit-factor
yarn.scheduler.capacity.maximum-applications 
  / yarn.scheduler.capacity.<queue-path>.maximum-applications
yarn.scheduler.capacity.maximum-am-resource-percent 
  / yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent

在Hadoop上查看有关这些属性的更多详细信息：Capacity Scheduler - Queue Properties

score 0 · Accepted Answer

请根据您的内存资源考虑以下情况

容器的数量取决于块大小的数量。如果你有 512 mb 块大小的 2 GB 数据，Yarn 会创建 4 个 map 和 1 个 reduce。在运行 mapreduce 时，我们应该遵循一些规则来提交 mapreduce 作业。（这应该适用于小型集群）

您应该根据 RAM DISK 和 CORES 配置以下属性。

<property>
    <description>The minimum allocation for every container request at the RM,
    in MBs. Memory requests lower than this won't take effect,
    and the specified value will get allocated at minimum.</description>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>512</value>
  </property>

  <property>
    <description>The maximum allocation for every container request at the RM,
    in MBs. Memory requests higher than this won't take effect,
    and will get capped to this value.</description>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
  </property>


 <property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>2048</value>
 </property>

并根据内存资源设置 Java 堆大小。一旦根据 mapreduce 在 yarn-site.xml 中确保了上述属性，就会有效地成功。

hadoop - Oozie 作业卡在 Running 状态

2 回答 2

Related

Reference