0

我想为警报管理器创建新的接收器和路由以向 OpsGenie 发送心跳。

我试图通过定义 opsgenie_config 来实现它,但我无法将 ping 发送到 OpsGenie 中的心跳(我可以使用相同的 api 密钥向 OpsGenie 发送警报)。

我发现的另一种方法是使用 webhook_config(如#444中所建议),我的清单如下所示:

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: opsgenie-webhook
  labels:
    managedBy: team-sre
spec:
  receivers:
  - name: heartbeat
    webhookConfigs:
    - httpConfig:
        basicAuth:
          password:
            name: opsgenie-api-key
            key: address
      url: https://api.opsgenie.com/v2/heartbeats/sre-test-cluster/ping
  route:
    groupWait: 0s
    repeatInterval: 1m
    groupInterval: 1m
    matchers:
    - name: alertname
      value: Watchdog
    receiver: heartbeat

当我应用清单时,所描述的接收器和路由不会加载到 Alertmanager。当我检查日志时,没有记录错误,但也没有消息表明 sidecar 尝试加载新的 alertmanagerconfig。

有没有人遇到过同样的问题并知道如何解决?

4

1 回答 1

0

我在github 问题 #3970上找到了解决方案 ,要接受 basicAuth,必须提供用户名和密码。不错的技巧是将用户名设置为:base64 格式(Og==)。清单应定义如下:

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  labels:
    managedBy: team-sre
  name: alertmanager-opsgenie-config
  namespace: monitoring
spec:
  receivers:
  - name: deadmansswitch
    webhookConfigs:
      # url link to the specific heartbeat, replace test with heartbeat name
      - url: 'https://api.opsgenie.com/v2/heartbeats/<hearbeat-name>/ping'
        sendResolved: true
        httpConfig:
          basicAuth:
            # reference to secret containing login credentals
            password:
              key: apiKey
              name: opsgenie
            username:
              key: username
              name: opsgenie
  route:
    groupBy:
    - job
    groupInterval: 10s
    groupWait: 0s
    repeatInterval: 10s
    matchers:
      - name: alertname
        value: Watchdog
      - name: namespace
        value: monitoring
    receiver: deadmansswitch

---

apiVersion: v1
kind: Secret
metadata:
  namespace: monitoring
  name: opsgenie
type: Opaque
data:
  # apiKey in encoded in base64
  apiKey: YOUR_PASSWORD
  # ':' in base 64 - fix suggested in https://github.com/prometheus-operator/prometheus-operator/issues/3970#issuecomment-888893008
  username: Og==

在应用清单并触发与条件匹配的警报定义后,Opsgenie 会受到检测信号的影响。

于 2021-09-28T09:24:03.623 回答