我已经从 community-helm chart(14.6.0) 部署了 prometheus,它正在运行 alertmanager,它显示不时出现的错误(模板问题),错误消息显示没有任何额外的用处。问题是我已经通过 amtool 重新测试了配置并且在配置中没有收到错误
level=error ts=2021-08-17T14:43:08.787Z caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=2 err="opsgenie/opsgenie[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 422: {\"message\":\"Request body is not processable. Please check the errors.\",\"errors\":{\"message\":\"Message can not be empty.\"},\"took\":0.0,\"requestId\":\"38c37c18-5635-48bc-bb69-bda03e232cce\"}"
level=debug ts=2021-08-17T14:43:08.798Z caller=notify.go:685 component=dispatcher receiver=opsgenie integration=opsgenie[0] msg="Notify success" attempts=1
level=error ts=2021-08-17T14:43:08.804Z caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=2 err="opsgenie/opsgenie[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 422: {\"message\":\"Request body is not processable. Please check the errors.\",\"errors\":{\"message\":\"Message can not be empty.\"},\"took\":0.001,\"requestId\":\"70d2ac84-3422-4fe6-9d8b-e601fdc37b25\"}"
监控正在工作并获得警报只是想了解如何翻译此错误.. 启用调试模式没有提供更多信息可能有什么问题。
警报管理器配置:
global: {}
receivers:
- name: opsgenie
opsgenie_configs:
- api_key: XXX
api_url: https://api.eu.opsgenie.com/
details:
Prometheus alert: ' {{ .CommonLabels.alertname }}, {{ .CommonLabels.namespace }}, {{ .CommonLabels.pod }}, {{ .CommonLabels.dimension_CacheClusterId }}, {{ .CommonLabels.dimension_DBInstanceIdentifier }}, {{ .CommonLabels.dimension_DBClusterIdentifier }}'
http_config: {}
message: '{{ .CommonAnnotations.message }}'
priority: '{{ if eq .CommonLabels.severity "critical" }}P2{{ else if eq .CommonLabels.severity "high" }}P3{{ else if eq .CommonLabels.severity "warning" }}P4{{ else }}P5{{ end }}'
send_resolved: true
tags: ' Prometheus, {{ .CommonLabels.namespace }}, {{ .CommonLabels.severity }}, {{ .CommonLabels.alertname }}, {{ .CommonLabels.pod }}, {{ .CommonLabels.kubernetes_node }}, {{ .CommonLabels.dimension_CacheClusterId }}, {{ .CommonLabels.dimension_DBInstanceIdentifier }}, {{ .CommonLabels.dimension_Cluster_Name }}, {{ .CommonLabels.dimension_DBClusterIdentifier }} '
- name: deadmansswitch
webhook_configs:
- http_config:
basic_auth:
password: XXX
send_resolved: true
url: https://api.eu.opsgenie.com/v2/heartbeats/prometheus-nonprod/ping
- name: blackhole
route:
group_by:
- alertname
- namespace
- kubernetes_node
- dimension_CacheClusterId
- dimension_DBInstanceIdentifier
- dimension_Cluster_Name
- dimension_DBClusterIdentifier
- server_name
group_interval: 5m
group_wait: 10s
receiver: opsgenie
repeat_interval: 5m
routes:
- group_interval: 1m
match:
alertname: DeadMansSwitch
receiver: deadmansswitch
repeat_interval: 1m
- match_re:
namespace: XXX
- match_re:
alertname: HighMemoryUsage|HighCPULoad|CPUThrottlingHigh
- match_re:
namespace: .+
receiver: blackhole
- group_by:
- instance
match:
alertname: PrometheusBlackboxEndpoints
- match_re:
alertname: .*
- match_re:
kubernetes_node: .*
- match_re:
dimension_CacheClusterId: .*
- match_re:
dimension_DBInstanceIdentifier: .*
- match_re:
dimension_Cluster_Name: .*
- match_re: