prometheus - 更新 targetValue 后 HPA 不会在自定义指标上扩展

Question

我有一个 AWS EKS 集群，并且设置了 Prometheus 和 Prometheus-Adapter。问题是，我为我的 HPA 指标目标值（阈值一）尝试了不同的值，在某些情况下，我可以看到放大工作，但另一个失败。

提供我的 HPA：

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: tianbing-xxxxxxxxx-xxxxxxx-hpa
  labels:
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: tianbing-xxxxxxxxx-xxxxxxx
  minReplicas: 1
  maxReplicas: 5
  metrics:
    - type: Pods
      pods:
        metric:
          name: plex_queue_size
        target:
          type: AverageValue
          averageValue: 2
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
    scaleDown:
      stabilizationWindowSeconds: 300

在这种情况下，当我将目标值设置为 2 时，我可以看到我的 HPA 完美运行✅，这是日志：您可以看到当我的当前值为 8 时，它已成功扩展到 8/2 = 4 个节点✅，：

$ kubectl get hpa -w
NAME                             REFERENCE                               TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   <unknown>/2   1         5         1          31s
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   0/2           1         5         1          46s
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   8/2           1         5         1          3m34s
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   8/2           1         5         4          3m49s
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   8/2           1         5         4          4m5s
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   7666m/2       1         5         5          4m20s
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   7666m/2       1         5         5          4m25s

但是，现在，如果我更新我的 HPA 文件，将目标值更改为更大的值，例如 5。（我删除整个 HPA 并创建一个新的。）

metrics:
    - type: Pods
      pods:
        metric:
          name: plex_queue_size
        target:
          type: AverageValue
          averageValue: 5

从日志中可以看到我可以刮到目标数（开头是0），但是我等了很久之后，没有任何放大的行为发生❌。当前值被抓取，但没有发生放大操作❌。在我的情况下，我应该有 8/5，这是 2 个所需的副本，但是，没有任何反应。❌</p>

这是我获得 hpa 并描述 HPA 时所拥有的：

NAME                             REFERENCE                              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   <unknown>/5   1         5         1          75s
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   0/5           1         5         1          91s
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   8/5           1         5         1          3m32s
tianbing-xxxxxxxxx-xxxxxxx-hpa   Deployment/tianbing-xxxxxxxxx-xxxxxxx   7/5           1         5         1          4m18s



kubectl describe horizontalpodautoscaler.autoscaling/tianbing-xxxxxxxxx-xxxxxxx-hpa
Name:                        tianbing-xxxxxxxxx-xxxxxxx-hpa
Namespace:                    default
Labels:                       app.kubernetes.io/managed-by=Helm
Annotations:                  meta.helm.sh/release-name: tianbing
                              meta.helm.sh/release-namespace: default
CreationTimestamp:            Wed, 10 Mar 2021 14:14:13 -0800
Reference:                    Deployment/tianbing-xxxxxxxxx-xxxxxxx
Metrics:                      ( current / target )
  "plex_queue_size" on pods:  7 / 5
Min replicas:                 1
Max replicas:                 5
Deployment pods:              1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric plex_queue_size
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Warning  FailedGetPodsMetric           6m10s (x5 over 7m12s)  horizontal-pod-autoscaler  unable to get metric plex_queue_size: unable to fetch metrics from custom metrics API: the server could not find the metric plex_queue_size for pods
  Warning  FailedComputeMetricsReplicas  6m10s (x5 over 7m12s)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get pods metric value: unable to get metric plex_queue_size: unable to fetch metrics from custom metrics API: the server could not find the metric plex_queue_size for pods

注意：从描述中，我看到了一些EventforFailedGetPodsMetric和FailedComputeMetricsReplicas，我相信这只是因为 HPA 的初始设置，因为从 HPA 中，我可以抓取指标以及AbleToScaleandScalingActive也是True。

prometheus - 更新 targetValue 后 HPA 不会在自定义指标上扩展

0 回答 0

Related

Reference