1

我希望我的 prometheus 服务器从 pod 中抓取指标。

我按照以下步骤操作:

  1. 使用部署创建了一个 pod -kubectl apply -f sample-app.deploy.yaml
  2. 暴露相同的使用kubectl apply -f sample-app.service.yaml
  3. 使用部署的 Prometheus 服务器helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml
  4. 创建了一个 serviceMonitorkubectl apply -f service-monitor.yaml用于为 prometheus 添加目标。

所有 pod 都在运行,但是当我打开 prometheus 仪表板时,在仪表板 UI 的 status>targets 下,我没有看到sample-app 服务作为 prometheus 目标。

我已经验证了以下内容:

  1. 我可以看到sample-app当我执行kubectl get servicemonitors
  2. 我可以在 at 下看到 sample-app 以 prometheus 格式公开指标/metrics

此时我进一步调试,使用 进入prometheus pod kubectl exec -it pod/prometheus-server-65b759cb95-dxmkm -c prometheus-server sh ,并看到proemetheus 配置(/etc/config/prometheus.yml)没有sample-app 作为作业之一,所以我编辑了configmap 使用

kubectl edit cm prometheus-server -o yaml 添加

    - job_name: sample-app
        static_configs:
        - targets:
          - sample-app:8080

假设所有其他字段(例如抓取间隔),scrape_timeout 保持默认值。

我可以看到 /etc/config/prometheus.yml 中反映了同样的情况,但 prometheus 仪表板仍然没有显示sample-app为状态>目标下的目标。

以下是 prometheus-server 和服务监视器的 yaml。

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"prometheus-server-configmap-reload"},{"name":"prometheus-server"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server-configmap-reload"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server"}]},"modified":true}'
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: prom
  creationTimestamp: "2021-06-24T10:42:31Z"
  generation: 1
  labels:
    app: prometheus
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-14.2.1
    component: server
    heritage: Helm
    release: prometheus
  name: prometheus-server
  namespace: prom
  resourceVersion: "6983855"
  selfLink: /apis/apps/v1/namespaces/prom/deployments/prometheus-server
  uid: <some-uid>
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: prometheus
      component: server
      release: prometheus
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: prometheus
        chart: prometheus-14.2.1
        component: server
        heritage: Helm
        release: prometheus
    spec:
      containers:
      - args:
        - --volume-dir=/etc/config
        - --webhook-url=http://127.0.0.1:9090/-/reload
        image: jimmidyson/configmap-reload:v0.5.0
        imagePullPolicy: IfNotPresent
        name: prometheus-server-configmap-reload
        resources:
          limits:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
          requests:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
        securityContext:
          capabilities:
            drop:
            - NET_RAW
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
          readOnly: true
      - args:
        - --storage.tsdb.retention.time=15d
        - --config.file=/etc/config/prometheus.yml
        - --storage.tsdb.path=/data
        - --web.console.libraries=/etc/prometheus/console_libraries
        - --web.console.templates=/etc/prometheus/consoles
        - --web.enable-lifecycle
        image: quay.io/prometheus/prometheus:v2.26.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/healthy
            port: 9090
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 10
        name: prometheus-server
        ports:
        - containerPort: 9090
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/ready
            port: 9090
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 4
        resources:
          limits:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
          requests:
            cpu: 500m
            ephemeral-storage: 1Gi
            memory: 2Gi
        securityContext:
          capabilities:
            drop:
            - NET_RAW
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
        - mountPath: /data
          name: storage-volume
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: prometheus-server
      serviceAccountName: prometheus-server
      terminationGracePeriodSeconds: 300
      volumes:
      - configMap:
          defaultMode: 420
          name: prometheus-server
        name: config-volume
      - name: storage-volume
        persistentVolumeClaim:
          claimName: prometheus-server
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2021-06-24T10:43:25Z"
    lastUpdateTime: "2021-06-24T10:43:25Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2021-06-24T10:42:31Z"
    lastUpdateTime: "2021-06-24T10:43:25Z"
    message: ReplicaSet "prometheus-server-65b759cb95" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

服务监视器的 yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"creationTimestamp":"2021-06-24T07:55:58Z","generation":1,"labels":{"app":"sample-app","release":"prometheus"},"name":"sample-app","namespace":"prom","resourceVersion":"6884573","selfLink":"/apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app","uid":"34644b62-eb4f-4ab1-b9df-b22811e40b4c"},"spec":{"endpoints":[{"port":"http"}],"selector":{"matchLabels":{"app":"sample-app","release":"prometheus"}}}}
  creationTimestamp: "2021-06-24T07:55:58Z"
  generation: 2
  labels:
    app: sample-app
    release: prometheus
  name: sample-app
  namespace: prom
  resourceVersion: "6904642"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app
  uid: <some-uid>
spec:
  endpoints:
  - port: http
  selector:
    matchLabels:
      app: sample-app
      release: prometheus 
4

1 回答 1

2

您需要使用prometheus-community/kube-prometheus-stack包含 Prometheus 运算符的图表,以便根据 ServiceMonitor 资源自动更新 Prometheus 的配置。

您使用的prometheus-community/prometheus图表不包括在 Kubernetes API 中监视 ServiceMonitor 资源并相应更新 Prometheus 服务器的 ConfigMap 的 Prometheus 操作符。

您的集群中似乎安装了必要的 CustomResourceDefinitions (CRD),否则您将无法创建 ServiceMonitor 资源。这些未包含在prometheus-community/prometheus图表中,因此它们之前可能已添加到您的集群中。

于 2021-06-24T14:20:01.540 回答