我有一个显示延迟数据的表,现在我想编写一个警报查询,当请求(方法 + uri)的中位数高于 3000 毫秒(3 秒)时会发出警报
我用于该延迟表的查询是:
index=ms-app environment=prod AND "*"
| eval uri=replace(mvindex(split('request.uri', "?"), 0), "\/\d+[-+\w]+", "/:n"), methodOverride='request.headers.X-HTTP-Method-Override'
| eval methodOverrideStr = if(isnull(methodOverride) OR methodOverride=="null", "", "(" + methodOverride + ")")
| eval request = 'request.method' + methodOverrideStr + " " + uri + " " + 'response.httpStatusCode'
| stats
min(stats.overallResponseTimeInMilliSeconds) as "Min",
avg(stats.overallResponseTimeInMilliSeconds) as avg_latency,
max(stats.overallResponseTimeInMilliSeconds) as "Max",
median(stats.overallResponseTimeInMilliSeconds) as "Median",
perc95(stats.overallResponseTimeInMilliSeconds) as "95th %",
count(request) as "# req total", count(eval('stats.overallResponseTimeInMilliSeconds' > 3000)) as "#>3s",
count(eval('stats.overallResponseTimeInMilliSeconds' > 5000)) as "#>5s",
count(eval('stats.overallResponseTimeInMilliSeconds' > 10000)) as "#>10s" by request
| eval "Avg" = round(avg_latency, 0)
| table request, "Median"
这会生成一个表格,显示基于方法 + uri 的中值延迟例如:
- POST /第一个端点 1000
- GET /第二个端点 2000
- 删除 /第三端点 1500
- POST /第四端点 4000
- 获取 /第五端点 4500
现在我正在尝试创建一个查询,该查询将仅显示具有高于 3 秒的高中值延迟的方法 +uris,以便我可以创建警报,以提醒 splunk 哪些端点具有高延迟这是我尝试过的:
index=ms-app environment=prod AND "*"
| eval uri=replace(mvindex(split('request.uri', "?"), 0), "\/\d+[-+\w]+", "/:n"), methodOverride='request.headers.X-HTTP-Method-Override'
| eval methodOverrideStr = if(isnull(methodOverride) OR methodOverride=="null", "", "(" + methodOverride + ")")
| eval request = 'request.method' + methodOverrideStr + " " + uri + " " + 'response.httpStatusCode'
| stats
median(stats.overallResponseTimeInMilliSeconds) as "Median"
| table request, "Median" > 3000
应该显示这个:
- POST /第四端点 4000
- 获取 /第五端点 4500
但是它只显示与第一个查询相同的结果