0

我正面临一个部署到 gke 的 Grafana 应用程序的运行状况检查问题,我试图通过全局 HTTP(S) 负载均衡器(通过谷歌管理的 TLS 证书保护)公开该应用程序。
我有 2 个应用程序部署到 gke:Grafna 和 InfluxDB。这两个服务都使用 helm 部署,并且副本设置为 1。它们前面的服务如下所示:

apiVersion: v1
kind: Service
metadata:
  annotations:
    cloud.google.com/neg: '{"ingress":true}'
    meta.helm.sh/release-name: grafana
    meta.helm.sh/release-namespace: monitoring
  creationTimestamp: "2022-01-06T13:30:16Z"
  labels:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/instance: grafana
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: grafana
    helm.sh/chart: grafana-7.5.0
  name: grafana
  namespace: monitoring
spec:
  clusterIP: 10.104.7.143
  clusterIPs:
  - 10.104.7.143
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 3000
    protocol: TCP
    targetPort: dashboard
  selector:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/instance: grafana
    app.kubernetes.io/name: grafana
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
apiVersion: v1
kind: Service
metadata:
  annotations:
    cloud.google.com/neg: '{"ingress":true}'
    meta.helm.sh/release-name: influxdb
    meta.helm.sh/release-namespace: monitoring
  creationTimestamp: "2022-01-06T12:22:04Z"
  labels:
    app.kubernetes.io/component: influxdb
    app.kubernetes.io/instance: influxdb
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: influxdb
    helm.sh/chart: influxdb-2.4.6
  name: influxdb
  namespace: monitoring
spec:
  clusterIP: 10.104.10.142
  clusterIPs:
  - 10.104.10.142
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 8086
    protocol: TCP
    targetPort: http
  - name: rpc
    port: 8088
    protocol: TCP
    targetPort: rpc
  selector:
    app.kubernetes.io/component: influxdb
    app.kubernetes.io/instance: influxdb
    app.kubernetes.io/name: influxdb
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

当我应用入口并创建负载均衡器时,如果将 Grafana 设置为默认后端服务,则运行状况检查将起作用并且该服务是可访问的。但是如果我为另一个服务添加任何新规则,Grafana 后端就会变得不健康(健康规则没有改变)。如果 Grafana 不是设置为默认后端并且不是唯一存在的后端,则同样的事情。健康规则是谷歌根据副本集自动创建的规则。后端类型Zonal network endpoint group适用于 grafana 和 influx。

  • S1 -格拉法纳作品
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: managed-cert-ingress
  annotations:
    kubernetes.io/ingress.global-static-ip-name: gke-static-ip
    networking.gke.io/managed-certificates: managed-cert
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.allow-http: "false"
spec:
  defaultBackend:
    service:
      name: grafana
      port:
        number: 3000
  • S2 -默认 k8s 后端正在运行Grafana由于后端不健康而无法运行
spec:
  rules:
    - http:
        paths:
          - path: /grafana/*
            pathType: ImplementationSpecific
            backend:
              service:
                name: grafana
                port:
                  number: 3000
  • S3:默认 k​​8s 后端正常工作influxdb 后端正常工作grafana 不工作:不健康的后端
spec:
  rules:
    - http:
        paths:
          - path: /grafana/*
            pathType: ImplementationSpecific
            backend:
              service:
                name: grafana
                port:
                  number: 3000
          - path: /influxdb/*
            pathType: ImplementationSpecific
            backend:
              service:
                name: influxdb
                port:
                  number: 8086
  • S4:Inlfux 后端默认工作grafana 不工作:不健康的后端
spec:
  defaultBackend:
    service:
      name: influxdb
      port:
        number: 8086
  rules:
    - http:
        paths:
          - path: /grafana/*
            pathType: ImplementationSpecific
            backend:
              service:
                name: grafana
                port:
                  number: 3000
          - path: /influxdb/*
            pathType: ImplementationSpecific
            backend:
              service:
                name: influxdb
                port:
                  number: 8086

在 grafana 的所有失败场景中,负载均衡器都会记录一个带有failed_to_connect_to_backend标签的502 。

InfluxDb 后端在所有这些场景中都能正常工作。
我也检查了防火墙规则,一切似乎都很好......

4

0 回答 0