2

我想在普罗米修斯中根据一个简单的英语会读到类似的规则发出警报

alert if metric X has dropped once by 5% in the last 5 minutes.

满足此规则的要求是测量以 1 分钟间隔出现的连续数据点的下降,如果任何数据点的下降大于或等于 5%,我们就会发送警报。

我正在使用不同记录规则的组合来实现这一点。我要的算法如下

# First group of rules, runs every 1 minute
# Recording rule which measures the percentage drop between consecutive points
((idelta(metricX{job="A"}[2m]) / (metricX{job="A"} offset 1m)) * 100)

# Recording rule which generate a time series of 1 if percent drop is >= X% or 0 otherwise
<insert expression here>

# Second group of rules begins which runs every 5 minutes
# Alert rule which reads and sums the timeseries of 1's and 0's over the last 5 minutes and alerts if sum is greater than 0
sum_over_time(timeseries_1_0[5m]) > 0

第二条录音规则怎么写?我已经尝试过clamp_max/min。但我不认为那是我想要的。对我有帮助的是 promQL 中的 if/else 构造。没有时间序列查询方面的经验也无济于事。对此的任何帮助将不胜感激。

4

1 回答 1

1

这应该有效:

record: metricX:idelta_ratio
expr: ((idelta(metricX{job="A"}[2m]) / (metricX{job="A"} offset 1m)) * 100)

record: metricX:idelta_ratio_le-5
expr: metricX:idelta_ratio <= bool -5

alert: MetricXDroppedBy5Percent
expr: sum_over_time(metricX:idelta_ratio_le-5[5m]) > 0
...

但请注意,Prometheus 不保证每分钟准确收集一次您的指标。或者您的规则每分钟只评估一次。并且您正在对规则中的1m2m范围进行硬编码,如果您的抓取间隔发生变化,这可能会以有趣的方式出现错误。

于 2019-05-06T15:27:57.133 回答