amazon-web-services - Why are AWS Batch Jobs stuck in RUNNABLE?

Question

I use a computing environment of 0-256 m3.medium on demand instances. My Job definition requires 1 CPU and 3 GB of Ram, which m3.medium has.

What are possible reasons why AWS Batch Jobs are stuck in state RUNNABLE?

AWS says:

A job that resides in the queue, has no outstanding dependencies, and is therefore ready to be scheduled to a host. Jobs in this state are started as soon as sufficient resources are available in one of the compute environments that are mapped to the job’s queue. However, jobs can remain in this state indefinitely when sufficient resources are unavailable.

but that does not answer my question

score 32 · Accepted Answer

Job 卡在 RUNNABLE 中还有其他原因：

与计算环境关联的角色权限不足
计算环境实例无法访问 Internet。您需要将NAT或Internet 网关关联到计算环境子网。
- 确保检查计算环境子网上的“启用自动分配公共 IPv4 地址”设置。（@thisisbrians 在评论中指出）
你的形象有问题。您需要使用 ECS 优化的 AMI 或确保 ECS 容器代理正常工作。aws 文档中的更多信息
您正在尝试启动您的帐户被限制为 0 个实例的实例（EC2 控制台 > 限制，在左侧菜单中）。（阅读更多关于gergely-danyi 的评论）
如前所述，资源不足

此外，请务必阅读AWS Batch 故障排除

score 8 · Accepted Answer

至少应该使用下一个策略和受信任的关系来定义角色。如果没有，他们将卡在 RUNNABLE 中，因为他们没有足够的权限来启动：

AWSBatchServiceRole

附加政策：AWSBatchServiceRole

信任关系：batch.amazonaws.com

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
         "Service": "batch.amazonaws.com"
       },
      "Action": "sts:AssumeRole"
    }
  ]
}

ecsInstanceRole

附加政策：AmazonEC2ContainerServiceforEC2Role

信任关系：ec2.amazonaws.com

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
         "Service": "ec2.amazonaws.com"
       },
      "Action": "sts:AssumeRole"
    }
  ]
}

score 3 · Accepted Answer

我只是与这个斗争了一段时间，并找到了答案。

作业卡住的一个可能原因Runnable是没有实例可以运行作业。如果是这种情况，查看上述答案中提到的 Auto Scaling 组可以向您显示阻止实例启动的实际错误，引导您找到确切的问题，而不是让您尝试任何数量的解决方案来解决您不知道的问题没有。错误消息是我们的朋友。

score 1 · Accepted Answer

如果它有用，想分享这个来自 AWS 云支持工程师的非常有用的视频：

https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-stuck-runnable-status/

score 1 · Accepted Answer

您的计算环境可能无效。检查 AWS Batch -> 计算环境 -> 状态列。我的说无效，这个符号在计算环境名称旁边：

单击计算环境给了我更多信息——我的 AMI ID 错误。

amazon-web-services - Why are AWS Batch Jobs stuck in RUNNABLE?

5 回答 5

AWSBatchServiceRole

ecsInstanceRole

Related

Reference