我已经在 prometheus 中配置了节点内存使用警报。我的警报模板如下:
- alert: NodeMemory Usage(development)
annotations:
description: '{{$labels.instance}} Memory usage is critical (current value is: {{ $value }})'
summary: High Memory usage detected
expr: |
1 - sum by(node) ((node_memory_MemFree{job="node-exporter"} + node_memory_Cached{job="node-exporter"} + node_memory_Buffers{job="node-exporter"}) * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:) / sum by(node) (node_memory_MemTotal{job="node-exporter"}* on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:) > 0.70
for: 1s
labels:
severity: warning
当单个节点的阈值超过时(此处的节点名称为nodes-3z4c),我收到警报中的节点名称,如下所示:
[FIRING:1] (NodeMemory Usage(development) nodes-3z4c monitoring/k8s warning)
Memory usage is critical (current value is: 0.7148033249432908)
但问题是,当多个节点超过阈值时,多个节点的名称没有在警报通知中指定并得到如下通知:
[FIRING:4] NodeMemory Usage (monitoring/k8s)
Memory usage is critical (current value is: 0.7319404231240473)
Memory usage is critical (current value is: 0.7856648253333621)
有人可以帮我解决这个问题吗?