kubernetes-pod - container_cpu_cfs_throttled_seconds_total 指标是什么意思

Question

cadvisor 有两个指标container_cpu_cfs_throttled_seconds_total和container_cpu_cfs_throttled_periods_total

我很困惑这是什么意思..

我找到了两个解释：</p>

容器以 cpu 限制运行，当容器 cpu 超过限制时，容器将被“节流”并添加时间到 container_cpu_cfs_throttled_seconds_total

that means ：
 (1). only container cpu over limit, rate(container_cpu_cfs_throttled_seconds_total) > 0. 
 (2). we can use this metrics to alert container cpu over limit ...

当主机处于沉重的 cpu 压力时，它将使用 POD QoS（保证 > Burstable > Best-Effort）“限制”容器......

that means ：
 (1). container_cpu_cfs_throttled_seconds_total will add has no relate with how many cpu container used and cpu limit ..
 (2). this metrics can not to alert container cpu over limit ..

score 8 · Accepted Answer

container_cpu_cfs_throttled_seconds_total是所有节流持续时间的总和，即容器被节流的持续时间，即使用使用CFS Cgroup 带宽控制停止的持续时间。

由于每个停止的线程都会将其限制持续时间添加到container_cpu_cfs_throttled_seconds_total，因此这个数字可能会变得很大并且对您没有帮助（除非您有已知的固定数量的线程）。

这就是为什么 CPU 节流警报通常基于指标throttled percentage:= container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total，即容器运行但被节流（停止运行整个 CPU 周期）的 CPU 周期百分比。

更详细的可以看这篇关于 CFS 和 CPU 调度的演讲，或者阅读相应的文章。

score 4 · Accepted Answer

假设在 machine1 上运行的 httpbin 容器。假设 httbin 在其部署中设置了使用最多 1 个 CPU 的限制。machine1 有 2 个 CPU。它使 httpbin 可以使用一半的可用资源。

如果 httpbin 容器尝试使用超过 1 个 CPU，kubernetes 不会杀死该容器。它会扼杀它。如果它经常发生，您可能希望收到警报并修复部署。另一种情况是，如果 machine1 中有多个容器并且 CPU 资源不足，那么它将限制它拥有的所有容器。

container_cpu_cfs_throttled_seconds_total 是容器被限制的总持续时间（以秒为单位）。container_cpu_cfs_throttled_periods_total 是限制周期间隔数

kubernetes-pod - container_cpu_cfs_throttled_seconds_total 指标是什么意思

2 回答 2

Related

Reference