2

我创建了 CloudFormation 模板,该模板创建 ECS 服务和任务,并具有任务自动缩放功能。这是非常基本的 - 如果任务的 MemoruUtilization 达到某个值,则添加 1 个任务,反之亦然。以下是一些最相关的部分表单模板。

  EcsTd:
    Type: AWS::ECS::TaskDefinition
    DependsOn: LogGroup
    Properties:
      Family: !Sub ${EnvironmentName}-${PlatformName}-${Type}
      ContainerDefinitions:
      - Name: !Sub ${EnvironmentName}-${PlatformName}-${Type}
        Image: !Sub ${AWS::AccountId}.dkr.ecr.{AWS::Region}.amazonaws.com/${PlatformName}:${ImageVersion}
        Environment:
        - Name: APP_ENV
          Value: !If [isProd, "production", "staging"]
        - Name: APP_DEBUG
          Value: "false"
        ...

    PortMappings:
    - ContainerPort: 80
      HostPort: 0
    Memory: !Ref Memory
    Essential: true
  EcsService:
    Type: AWS::ECS::Service
    DependsOn: WaitForLoadBalancerListenerRulesCondition
    Properties:
      ServiceName: !Sub ${EnvironmentName}-${PlatformName}-${Type}
      Cluster:
        Fn::ImportValue: !Sub ${EnvironmentName}-ECS-${Type}
      DesiredCount: !Sub ${DesiredCount}
      TaskDefinition: !Ref EcsTd
      Role: "learningEcsServiceRole"
      LoadBalancers:
      - !If
        - isWeb
        - ContainerPort: 80
          ContainerName: !Sub ${EnvironmentName}-${PlatformName}-${Type}
          TargetGroupArn: !Ref AlbTargetGroup
        - !Ref AWS::NoValue
  ServiceScalableTarget:
    Type: "AWS::ApplicationAutoScaling::ScalableTarget"
    Properties:
      MaxCapacity: !Sub ${MaxCount}
      MinCapacity: !Sub ${MinCount}
      ResourceId: !Join
      - /
      - - service
        - !Sub ${EnvironmentName}-${Type}
        - !GetAtt EcsService.Name
      RoleARN: arn:aws:iam::645618565575:role/learningEcsServiceRole
      ScalableDimension: ecs:service:DesiredCount
      ServiceNamespace: ecs

  ServiceScaleOutPolicy:
    Type : "AWS::ApplicationAutoScaling::ScalingPolicy"
    Properties:
      PolicyName: !Sub ${EnvironmentName}-${PlatformName}-${Type}- ScaleOutPolicy
      PolicyType: StepScaling
      ScalingTargetId: !Ref ServiceScalableTarget
      StepScalingPolicyConfiguration:
        AdjustmentType: ChangeInCapacity
        Cooldown: 1800
        MetricAggregationType: Average
        StepAdjustments:
        - MetricIntervalLowerBound: 0
          ScalingAdjustment: 1
  MemoryScaleOutAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub ${EnvironmentName}-${PlatformName}-${Type}-MemoryOver70PercentAlarm
      AlarmDescription: Alarm if memory utilization greater than 70% of reserved memory
      Namespace: AWS/ECS
      MetricName: MemoryUtilization
      Dimensions:
      - Name: ClusterName
        Value: !Sub ${EnvironmentName}-${Type}
      - Name: ServiceName
        Value: !GetAtt EcsService.Name
      Statistic: Maximum
      Period: '60'
      EvaluationPeriods: '1'
      Threshold: '70'
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
      - !Ref ServiceScaleOutPolicy
      - !Ref EmailNotification

  ...

因此,当任务开始耗尽内存时,我们将添加新任务。然而,在某些时候,我们将达到集群外可用内存量的限制。

例如,集群由一个 t2.small 实例组成,那么我们有 2Gb RAM。其中一小部分由在实例中运行的 ECS 任务使用,因此我们的 RAM 少于 2GB。如果我们将 Task 的内存值设置为 512Mb,那么除非我们扩大集群,否则我们只能在该集群中放置 3 个任务。

默认情况下,ECS 服务具有可用于自动扩展集群的 MemoryReservation 指标。我们会告诉当 MemoryReservation 超过 75% 时,将 1 个实例添加到集群中。这相对容易。

EcsCluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterName: !Sub ${EnvironmentName}-${Type}
  SgEcsHost:
    ...
  ECSLaunchConfiguration:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      ImageId: !FindInMap [AWSRegionToAMI, !Ref 'AWS::Region', AMIID]
      InstanceType: !Ref InstanceType
      SecurityGroups: [ !Ref SgEcsHost ]
      AssociatePublicIpAddress: true
      IamInstanceProfile: "ecsInstanceRole"
      KeyName: !Ref KeyName
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          echo ECS_CLUSTER=${EnvironmentName}-${Type} >> /etc/ecs/ecs.config
  ECSAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      VPCZoneIdentifier:
      - Fn::ImportValue: !Sub ${EnvironmentName}-SubnetEC2AZ1
      - Fn::ImportValue: !Sub ${EnvironmentName}-SubnetEC2AZ2
      LaunchConfigurationName: !Ref ECSLaunchConfiguration
      MinSize: !Ref AsgMinSize
      MaxSize: !Ref AsgMaxSize
      DesiredCapacity: !Ref AsgDesiredSize
      Tags:
      - Key: Name
        Value: !Sub ${EnvironmentName}-ECS
        PropagateAtLaunch: true
  ScalePolicyUp:
    Type: AWS::AutoScaling::ScalingPolicy
    Properties:
      AdjustmentType: ChangeInCapacity
      AutoScalingGroupName:
        Ref: ECSAutoScalingGroup
      Cooldown: '1'
      ScalingAdjustment: '1'
  MemoryReservationAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      EvaluationPeriods: '1'
      Statistic: Average
      Threshold: '75'
      AlarmDescription: Alarm if MemoryReservation is more then 75%
      Period: '60'
      AlarmActions:
      - Ref: ScalePolicyUp
      - Ref: EmailNotification
      Namespace: AWS/EC2
      Dimensions:
      - Name: AutoScalingGroupName
        Value:
          Ref: ECSAutoScalingGroup
      ComparisonOperator: GreaterThanThreshold
      MetricName: MemoryReservation

然而这没有意义,因为这会在添加第三个任务时发生,因此新实例将是空的,直到第四个任务被缩放。这意味着我们将支付我们不使用的费用。

我注意到,当 ECS 服务尝试将任务添加到没有足够可用内存的集群时,我得到

service Production-admin-worker 无法放置任务,因为没有容器实例满足其所有要求。最接近的匹配容器实例################### 可用内存不足。

在这个例子中,模板的参数是:

EnvironmentName=Production
PlatformName=Admin
Type=worker

是否可以创建查看 ECS 集群事件并查找该特定模式的 AWS::CloudWatch::Alarm?这个想法是AWS::AutoScaling::AutoScalingGroup仅在AWS::ApplicationAutoScaling::ScalingPolicy添加集群中没有空间的任务时才使用集群中的实例计数。并在 MemoryReservation 小于 25% 时缩小集群(这意味着那里没有运行任务 -AWS::ApplicationAutoScaling::ScalingPolicy已删除它们)。

4

1 回答 1

0

这意味着我们将支付我们不使用的费用。

您可以提前为额外/备份容量付费,或者实施逻辑以重试因容量不足而失败的容量。

我能想到的几种方法:

  • 您可以创建一个自定义脚本/lambda ( https://forums.aws.amazon.com/thread.jspa?threadID=94984 ) 报告一个指标,比如load_factor计算为number of tasks / number of instances,然后以此为基础制定您的自动扩展策略。Lambda 可以由 CW 规则触发。
    • 您也可以从您的任务实现而不是新的自定义 lambda/脚本中报告这一点。
  • 创建一个指标过滤器,在日志文件/组中查找特定模式并报告指标。然后当然使用这个指标进行缩放。

来自文档:

当指标筛选器在您的日志事件中找到其中一个术语、短语或值时,您可以增加 CloudWatch 指标的值。

于 2018-05-03T13:53:42.933 回答