1

我有 AlertManager 的单实例集群,我在 AlertManager 容器中看到警告level=warn ts=2021-11-03T08:50:44.528Z caller=delegate.go:272 component=cluster msg="dropping messages because too many are queued" current=4125 limit=4096

警报管理器版本信息:

Version Information
Branch: HEAD
BuildDate: 20190708-14:31:49
BuildUser: root@868685ed3ed0
GoVersion: go1.12.6
Revision: 1ace0f76b7101cccc149d7298022df36039858ca
Version: 0.18.0

警报管理器指标

# HELP alertmanager_cluster_members Number indicating current number of members in cluster.
# TYPE alertmanager_cluster_members gauge
alertmanager_cluster_members 1
# HELP alertmanager_cluster_messages_pruned_total Total number of cluster messages pruned.
# TYPE alertmanager_cluster_messages_pruned_total counter
alertmanager_cluster_messages_pruned_total 23020
# HELP alertmanager_cluster_messages_queued Number of cluster messages which are queued.
# TYPE alertmanager_cluster_messages_queued gauge
alertmanager_cluster_messages_queued 4125
  • 我们如何在 AlertManager 中看到那些排队的消息?

  • 当消息由于排队太多而被丢弃时,我们是否会丢失警报?

  • 即使有逻辑可以定期(即 15 分钟)修剪消息,为什么还要排队?

  • 当 AlertManager 定期修剪消息时,我们会丢失警报吗?

我是警报新手。您能回答以上问题吗?

4

0 回答 0