hadoop - AM Container 的运行超出了虚拟内存限制

Question

我在玩分布式 shell 应用程序（hadoop-2.0.0-cdh4.1.2）。这是我目前收到的错误。

13/01/01 17:09:09 INFO distributedshell.Client: Got application report from ASM for, appId=5, clientToken=null, appDiagnostics=Application application_1357039792045_0005 failed 1 times due to AM Container for appattempt_1357039792045_0005_000001 exited with  exitCode: 143 due to: Container [pid=24845,containerID=container_1357039792045_0005_01_000001] is running beyond virtual memory limits. Current usage: 77.8mb of 512.0mb physical memory used; 1.1gb of 1.0gb virtual memory used. Killing container.
Dump of the process-tree for container_1357039792045_0005_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 24849 24845 24845 24845 (java) 165 12 1048494080 19590 /usr/java/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 --num_containers 1 --priority 0 --shell_command ping --shell_args localhost --debug
|- 24845 23394 24845 24845 (bash) 0 0 108654592 315 /bin/bash -c /usr/java/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 128 --num_containers 1 --priority 0 --shell_command ping --shell_args localhost --debug 1>/tmp/logs/application_1357039792045_0005/container_1357039792045_0005_01_000001/AppMaster.stdout 2>/tmp/logs/application_1357039792045_0005/container_1357039792045_0005_01_000001/AppMaster.stderr

有趣的是，设置似乎没有问题，因为一个简单的lsoruname命令成功完成，并且输出在 container2 标准输出中可用。

关于设置，yarn.nodenamager.vmem-pmem-ratio可用3的总物理内存为 2GB，我认为这对于运行来说已经绰绰有余了。

对于有问题的命令，“ping localhost”生成了两个回复，从containerlogs/container_1357039792045_0005_01_000002/721917/stdout/?start=-4096.

那么，可能是什么问题？

score 14 · Accepted Answer

从错误消息中，您可以看到您使用的虚拟内存超过了当前的 1.0gb 限制。这可以通过两种方式解决：

禁用虚拟内存限制检查

YARN 将简单地忽略限制；为此，请将其添加到您的yarn-site.xml：

<property>
  <name>yarn.nodemanager.vmem-check-enabled</name>
  <value>false</value>
  <description>Whether virtual memory limits will be enforced for containers.</description>
</property>

此设置的默认值为true。

增加虚拟内存与物理内存的比率

在您yarn-site.xml将其更改为高于当前设置的值

<property>
  <name>yarn.nodemanager.vmem-pmem-ratio</name>
  <value>5</value>
  <description>Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio.</description>
</property>

默认是2.1

您还可以增加分配给容器的物理内存量。

确保在更改配置后不要忘记重新启动 yarn。

score 13 · Accepted Answer

无需更改集群配置。我发现只提供额外的参数

-Dmapreduce.map.memory.mb=4096

distcp 对我有帮助。

score 3 · Accepted Answer

如果你正在运行 Tez 框架，必须在 Tez-site.xml 中设置以下参数

tez.am.resource.memory.mb
tez.task.resource.memory.mb
tez.am.java.opts

在 Yarn-site.xml 中

yarn.nodemanager.resource.memory-mb
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.vmem-check-enabled
yarn.nodemanager.vmem-pmem-ratio

所有这些参数都是必须设置的

score 0 · Accepted Answer

您可以在 yarn-site.xml 中将其更改为高于默认 1 GB 的值

yarn.app.mapreduce.am.resource.mb

score 0 · Accepted Answer

在实践中，当对包含大量文件/小文件或未分桶的表或查询大量分区的大型表或表运行查询时，我已经看到此问题。

当 Tez 尝试计算它需要生成多少个映射器时，就会出现问题，并且在进行此计算时，由于默认值 (1gb) 太少，它往往会出现 OOM。

解决此问题的方法是不要设置tez.am.resource.memory.mb为 2gb 或 4gb。此外，另一件非常重要的事情是，不能从 hive 查询中设置此设置，因为到那时为时已晚。AM 是第一个由 yarn 生成的容器，因此在 hive 查询中设置它是没有用的。

该设置需要在 *-site.xml 中设置，或者在生成 hive shell 时设置，如下所示：

hive --hiveconf tez.am.resource.memory.mb=2048 my-large-query.hql

在上面的示例中，am 发出信号以生成 2gb 的 AM，而不是默认值。

参考： http: //moi.vonos.net/bigdata/hive-cli-memory/

score -1 · Accepted Answer

好的，发现了。将 master memry 参数增加到 750MB 以上，您将成功运行 YARN 应用程序。

hadoop - AM Container 的运行超出了虚拟内存限制

6 回答 6

Related

Reference