jvm - 如何使用 Micrometer 和 Alertmanager 在 Prometheus 中提醒 JVM 内存使用情况

Question

我是 Prometheus 和 Micrometer 的新手。我试图在 JVM 的堆内存使用量超过某个阈值时发出警报。

- alert: P1 - Percentage of heap memory usage on environment more than 3% for 5 minutes.
    expr: sum(jvm_memory_used_bytes{application="x", area="heap"})*100/sum(jvm_memory_max_bytes{application="x", area="heap"}) by (instance) > 3
    for: 5m
    labels:
      priority: P1
      tags: infrastructure, jvm, memory
    annotations:
      summary: "Percentage of heap memory is more than threshold"
      description: "Percentage of heap memory for instance '{{ $labels.instance }}' has been more than 3% ({{ $value }}) for 5 minutes."

现在，当我在 Grafana 上使用此表达式时，此表达式正在起作用：

但是在 Prometheus 中是这样的：

当内存使用量超过某个限制时，如何让我的警报发出警报？

score 4 · Accepted Answer

您希望随时间平均堆使用量。我想出了以下内容：

- name: jvm
  rules:
    - alert: jvm_heap_warning
      expr: sum(avg_over_time(jvm_memory_used_bytes{area="heap"}[1m]))by(application,instance)*100/sum(avg_over_time(jvm_memory_max_bytes{area="heap"}[1m]))by(application,instance) >= 80
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "JVM heap warning"
          description: "JVM heap of instance `{{$labels.instance}}` from application `{{$labels.application}}` is above 80% for one minute. (current=`{{$value}}%`)"

score 2 · Accepted Answer

您的警报已正确配置为仅在查询结果连续 5 分钟高于 3 时才发出警报。根据查询的 Prometheus 中的图表，它在过去一小时内没有这样做，因此没有生成警报。

同样值得注意的是，您用于规则的查询只会返回每个结果的实例标签。因此，如果您计划在警报中使用应用程序标签，则需要调整查询以同时返回应用程序标签，或者将该标签添加到规则中添加的标签列表中。

jvm - 如何使用 Micrometer 和 Alertmanager 在 Prometheus 中提醒 JVM 内存使用情况

2 回答 2

Related

Reference