我正在使用 Kusto 查询在 Azure AppInsights 中创建一个时间表,使用Google 的一个衡量 Web 服务是否在其错误预算内的示例来可视化我们的 Web 服务何时在其 SLO 内(以及何时不在) :
SLI = The proportion of sufficiently fast requests, as measured from the load balancer metrics. “Sufficiently fast” is defined as < 400 ms.
SLO = 90% of requests < 400 ms
Measured as:
count of http_requests with a duration less than or equal to "0.4" seconds
divided by count of all http_requests
假设在 7 天的窗口中检查间隔为 10 分钟,这是我的代码:
let fastResponseTimeMaxMs = 400.0;
let errorBudgetThresholdForFastResponseTime = 90.0;
//
let startTime = ago(7days);
let endTime = now();
let timeStep = 10m;
//
let timeRange = range InspectionTime from startTime to endTime step timeStep;
timeRange
| extend RespTimeMax_ms = fastResponseTimeMaxMs
| extend ActualCount = toscalar
(
requests
| where timestamp > InspectionTime - timeStep
| where timestamp <= InspectionTime
| where success == "True"
| where duration <= fastResponseTimeMaxMs
| count
)
| extend TotalCount = toscalar
(
requests
| where timestamp > InspectionTime - timeStep
| where timestamp <= InspectionTime
| where success == "True"
| count
)
| extend Percentage = round(todecimal(ActualCount * 100) / todecimal(TotalCount), 2)
| extend ErrorBudgetMinPercent = errorBudgetThresholdForFastResponseTime
| extend InBudget = case(Percentage >= ErrorBudgetMinPercent, 1, 0)
我希望实现的示例查询输出:
InspectionTime [UTC] RespTimeMax_ms ActualCount TotalCount Percentage ErrorBudgetMinPercent InBudget
2019-05-23T21:53:17.894 400 8,098 8,138 99.51 90 1
2019-05-23T22:03:17.894 400 8,197 9,184 89.14 90 0
2019-05-23T22:13:17.894 400 8,002 8,555 93.54 90 1
我得到的错误是:
'where' operator: Failed to resolve scalar expression named 'InspectionTime'
我试过todatetime(InspectionTime)
了,同样的错误失败了。
替换InspectionTime
为其他类型的对象datetime
可以使此代码正常执行,但不能使用我想要的日期时间值。例如,当在我上面的代码示例中使用时,使用此代码段执行正常:
| extend ActualCount = toscalar
(
requests
| where timestamp > startTime // instead of 'InspectionTime - timeStep'
| where timestamp <= endTime // instead of 'InspectionTime'
| where duration <= fastResponseTimeMaxMs
| count
)
InspectionTime
对我来说,使用within似乎toscalar(...)
是这个问题的症结所在,因为我可以InspectionTime
在类似的查询中使用range(...)
它,而不是将它嵌套在toscalar(...)
.
注意:我不想要 的时间表request.duration
,因为根据上面定义的公式,这并不能告诉我超过我的阈值(400 毫秒)的请求计数是否超过了我们的错误预算。