我应该考虑 CPU 利用率、网络流量或 http 响应时间检查吗?我已经用 Apache AB 运行了一些测试(来自同一台服务器 - eq: ab -k -n 500000 -c 100 http://192.XXX.XXX.XXX/) - 我监控了负载平均值。即使负载在 1.0 - 1.50(一台核心服务器)之间,“每个请求的时间”(平均值)也相当稳定,对于一个简单的动态页面,只需 140 毫秒,只需一次设置/获取 Redis 操作。无论如何,我很困惑,因为一般建议是当您超过 70% 的 CPU 利用率阈值时启动一个新实例。
1 回答
70% CPU utilization is a good rule of thumb for CPU-bound applications like nginx. CPU time is kind of like body temperature: it actually hides a lot of different things, but is a good general indicator of health. Load average is a separate measure of how many processes are waiting to be scheduled. The reason the rule is 70% (or 80%) utilization is that, past this point, CPU-bound appliations tend to suffer contention-induced latency and non-linear performance.
You can test this yourself by plotting throughput and latency (median and 90th percentile) against CPU utilization on your setup. Finding the inflection point for your particular system is important for capacity planning.
A very good writeup of this phenomenon is given in Facebook's original paper on Dyno, their system for measuring throughput of PHP under load.