我正面临一个部署到 gke 的 Grafana 应用程序的运行状况检查问题,我试图通过全局 HTTP(S) 负载均衡器(通过谷歌管理的 TLS 证书保护)公开该应用程序。
我有 2 个应用程序部署到 gke:Grafna 和 InfluxDB。这两个服务都使用 helm 部署,并且副本设置为 1。它们前面的服务如下所示:
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/neg: '{"ingress":true}'
meta.helm.sh/release-name: grafana
meta.helm.sh/release-namespace: monitoring
creationTimestamp: "2022-01-06T13:30:16Z"
labels:
app.kubernetes.io/component: grafana
app.kubernetes.io/instance: grafana
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: grafana
helm.sh/chart: grafana-7.5.0
name: grafana
namespace: monitoring
spec:
clusterIP: 10.104.7.143
clusterIPs:
- 10.104.7.143
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http
port: 3000
protocol: TCP
targetPort: dashboard
selector:
app.kubernetes.io/component: grafana
app.kubernetes.io/instance: grafana
app.kubernetes.io/name: grafana
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/neg: '{"ingress":true}'
meta.helm.sh/release-name: influxdb
meta.helm.sh/release-namespace: monitoring
creationTimestamp: "2022-01-06T12:22:04Z"
labels:
app.kubernetes.io/component: influxdb
app.kubernetes.io/instance: influxdb
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: influxdb
helm.sh/chart: influxdb-2.4.6
name: influxdb
namespace: monitoring
spec:
clusterIP: 10.104.10.142
clusterIPs:
- 10.104.10.142
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http
port: 8086
protocol: TCP
targetPort: http
- name: rpc
port: 8088
protocol: TCP
targetPort: rpc
selector:
app.kubernetes.io/component: influxdb
app.kubernetes.io/instance: influxdb
app.kubernetes.io/name: influxdb
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
当我应用入口并创建负载均衡器时,如果将 Grafana 设置为默认后端服务,则运行状况检查将起作用并且该服务是可访问的。但是如果我为另一个服务添加任何新规则,Grafana 后端就会变得不健康(健康规则没有改变)。如果 Grafana 不是设置为默认后端并且不是唯一存在的后端,则同样的事情。健康规则是谷歌根据副本集自动创建的规则。后端类型Zonal network endpoint group
适用于 grafana 和 influx。
- S1 -格拉法纳作品
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: managed-cert-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: gke-static-ip
networking.gke.io/managed-certificates: managed-cert
kubernetes.io/ingress.class: "gce"
kubernetes.io/ingress.allow-http: "false"
spec:
defaultBackend:
service:
name: grafana
port:
number: 3000
- S2 -默认 k8s 后端正在运行,Grafana由于后端不健康而无法运行
spec:
rules:
- http:
paths:
- path: /grafana/*
pathType: ImplementationSpecific
backend:
service:
name: grafana
port:
number: 3000
- S3:默认 k8s 后端正常工作,influxdb 后端正常工作,grafana 不工作:不健康的后端
spec:
rules:
- http:
paths:
- path: /grafana/*
pathType: ImplementationSpecific
backend:
service:
name: grafana
port:
number: 3000
- path: /influxdb/*
pathType: ImplementationSpecific
backend:
service:
name: influxdb
port:
number: 8086
- S4:Inlfux 后端默认工作,grafana 不工作:不健康的后端
spec:
defaultBackend:
service:
name: influxdb
port:
number: 8086
rules:
- http:
paths:
- path: /grafana/*
pathType: ImplementationSpecific
backend:
service:
name: grafana
port:
number: 3000
- path: /influxdb/*
pathType: ImplementationSpecific
backend:
service:
name: influxdb
port:
number: 8086
在 grafana 的所有失败场景中,负载均衡器都会记录一个带有failed_to_connect_to_backend标签的502 。
InfluxDb 后端在所有这些场景中都能正常工作。
我也检查了防火墙规则,一切似乎都很好......