我有一个 AWS EKS 集群,并且设置了 Prometheus 和 Prometheus-Adapter。问题是,我为我的 HPA 指标目标值(阈值一)尝试了不同的值,在某些情况下,我可以看到放大工作,但另一个失败。
提供我的 HPA:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: tianbing-xxxxxxxxx-xxxxxxx-hpa
labels:
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: tianbing-xxxxxxxxx-xxxxxxx
minReplicas: 1
maxReplicas: 5
metrics:
- type: Pods
pods:
metric:
name: plex_queue_size
target:
type: AverageValue
averageValue: 2
behavior:
scaleUp:
stabilizationWindowSeconds: 0
scaleDown:
stabilizationWindowSeconds: 300
在这种情况下,当我将目标值设置为 2 时,我可以看到我的 HPA 完美运行✅,这是日志:您可以看到当我的当前值为 8 时,它已成功扩展到 8/2 = 4 个节点✅,:
$ kubectl get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx <unknown>/2 1 5 1 31s
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx 0/2 1 5 1 46s
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx 8/2 1 5 1 3m34s
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx 8/2 1 5 4 3m49s
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx 8/2 1 5 4 4m5s
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx 7666m/2 1 5 5 4m20s
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx 7666m/2 1 5 5 4m25s
但是,现在,如果我更新我的 HPA 文件,将目标值更改为更大的值,例如 5。(我删除整个 HPA 并创建一个新的。)
metrics:
- type: Pods
pods:
metric:
name: plex_queue_size
target:
type: AverageValue
averageValue: 5
从日志中可以看到我可以刮到目标数(开头是0),但是我等了很久之后,没有任何放大的行为发生❌。当前值被抓取,但没有发生放大操作❌。在我的情况下,我应该有 8/5,这是 2 个所需的副本,但是,没有任何反应。❌</p>
这是我获得 hpa 并描述 HPA 时所拥有的:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx <unknown>/5 1 5 1 75s
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx 0/5 1 5 1 91s
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx 8/5 1 5 1 3m32s
tianbing-xxxxxxxxx-xxxxxxx-hpa Deployment/tianbing-xxxxxxxxx-xxxxxxx 7/5 1 5 1 4m18s
kubectl describe horizontalpodautoscaler.autoscaling/tianbing-xxxxxxxxx-xxxxxxx-hpa
Name: tianbing-xxxxxxxxx-xxxxxxx-hpa
Namespace: default
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: tianbing
meta.helm.sh/release-namespace: default
CreationTimestamp: Wed, 10 Mar 2021 14:14:13 -0800
Reference: Deployment/tianbing-xxxxxxxxx-xxxxxxx
Metrics: ( current / target )
"plex_queue_size" on pods: 7 / 5
Min replicas: 1
Max replicas: 5
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric plex_queue_size
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetPodsMetric 6m10s (x5 over 7m12s) horizontal-pod-autoscaler unable to get metric plex_queue_size: unable to fetch metrics from custom metrics API: the server could not find the metric plex_queue_size for pods
Warning FailedComputeMetricsReplicas 6m10s (x5 over 7m12s) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get pods metric value: unable to get metric plex_queue_size: unable to fetch metrics from custom metrics API: the server could not find the metric plex_queue_size for pods
注意:从描述中,我看到了一些Event
forFailedGetPodsMetric
和FailedComputeMetricsReplicas
,我相信这只是因为 HPA 的初始设置,因为从 HPA 中,我可以抓取指标以及AbleToScale
andScalingActive
也是True
。