4

我对亚马逊 AWS 上的 Kubernetes 和 Prometheus 的自定义指标有疑问。默认情况下,CPU 和内存指标运行良好。Prometheus http_requests 不是,这是错误:

$ kubectl describe hpa hpa-deploy
Name:                       hpa-deploy
Namespace:                  default
Labels:                     <none>
Annotations:                kubectl.kubernetes.io/last-applied-configuration:
                              {"apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"hpa-deploy","namespace":"default...
CreationTimestamp:          Thu, 06 Jun 2019 11:06:48 +0000
Reference:                  Deployment/django
Metrics:                    ( current / target )
  "http_requests" on pods:  <unknown> / 2k
Min replicas:               1
Max replicas:               10
Deployment pods:            1 current / 0 desired
Conditions:
  Type           Status  Reason               Message
  ----           ------  ------               -------
  AbleToScale    True    SucceededGetScale    the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetPodsMetric  the HPA was unable to compute the replica count: unable to get metric http_requests: unable to fetch metrics from custom metrics API: the server could not find the metric http_requests for pods
Events:
  Type     Reason               Age                     From                       Message
  ----     ------               ----                    ----                       -------
  Warning  FailedGetPodsMetric  8m53s (x414 over 114m)  horizontal-pod-autoscaler  unable to get metric http_requests: unable to fetch metrics from custom metrics API: the server is currently unable to handle the request (get pods.custom.metrics.k8s.io *)
  Warning  FailedGetPodsMetric  3m48s (x12 over 6m36s)  horizontal-pod-autoscaler  unable to get metric http_requests: unable to fetch metrics from custom metrics API: the server could not find the metric http_requests for pods

我按照github项目的建议使用helm安装了Prometheus并检查了api:

$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": []
}

然后添加以下规则:

$ kubectl edit cm my-release-prometheus-adapter
    rules:
    - seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
      resources:
        overrides:
          kubernetes_namespace: {resource: "namespace"}
          kubernetes_pod_name: {resource: "pod"}
      name:
        matches: "^(.*)_total"
        as: "${1}_per_second"
      metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

演练说添加新规则后,api检查的返回值应该在“资源”:[]内有值,但没有,我不知道为什么。

这是我的 hpa 代码:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-deploy
spec:
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: django
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests
        target:
          type: Value
          averageValue: 2k

另外,我使用的是基于 Nginx 的入口控制器,但入口和服务的 hpa kubectl describe 表明:

$ kubectl describe hpa hpa-ingress
Name:                                                      hpa-ingress
Namespace:                                                 default
Labels:                                                    <none>
Annotations:                                               kubectl.kubernetes.io/last-applied-configuration:
                                                             {"apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"hpa-ingress","namespace":"defaul...
CreationTimestamp:                                         Thu, 06 Jun 2019 11:06:48 +0000
Reference:                                                 Ingress/test-ingress
Metrics:                                                   ( current / target )
  "http_requests" on Ingress/test-ingress (target value):  <unknown> / 2k
Min replicas:                                              1
Max replicas:                                              10
Ingress pods:                                              0 current / 0 desired
Conditions:
  Type         Status  Reason          Message
  ----         ------  ------          -------
  AbleToScale  False   FailedGetScale  the HPA controller was unable to get the target's current scale: the server could not find the requested resource
Events:
  Type     Reason          Age                     From                       Message
  ----     ------          ----                    ----                       -------
  Warning  FailedGetScale  2m40s (x473 over 122m)  horizontal-pod-autoscaler  the server could not find the requested resource

我不确定是否必须手动导出 Pod 的 http_requests 指标,如果是这种情况,我该怎么做?文档都是“复制和粘贴,一切都会正常工作”,但事实并非如此。请,如果可能的话,越详细越好,我对这个主题真的很陌生。非常感谢。

4

1 回答 1

0

在我的情况下,这是由于错误的普罗米修斯端点。我通过将日志级别设置为 6 发现了这一点,并发现 prometheus 查询的日志因 404 错误而失败。

https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/README.md#why-isnt-my-metric-showing-up

于 2021-05-26T01:21:00.330 回答