USER_HZ
被实现为一种折衷方案:尽管用户代码的硬编码值可能与 不同USER_HZ
,但 Linux 内核历史上的HZ
值是100 ——因此几乎所有HZ
现有用户代码中的硬编码值都设置为100。
这是发生的事情的本质:
The Linux kernel used to have HZ set at a constant 100 for all
architectures. As additional architecture support was added, the HZ
value became variable: e.g. Linux on one machine could have a HZ
value of 1000 while Linux on another machine could have a HZ value
of 100.
This possibility of a variable HZ value caused existing user code,
which had hardcoded an expectation of HZ set to 100, to break due to
the exposure in userspace of kernel jiffies which may have be based
on a HZ value that was not equal to 100.
To prevent the chaos that would occur from years of existing user
code hardcoding a constant HZ value of 100, a compromise was made:
any exposure of kernel jiffies to userspace should be scaled via a
new USER_HZ value -- thus preventing existing user code from
breaking on machines with a different HZ value, while still allowing
the kernel on those machines to have a HZ value different from the
historic 100 value.
现在,这留下了为什么一些内核 jiffies 暴露于未缩放的用户空间(例如 in /proc/timer_list
)的问题。Thomas Gleixner 解释说:
事实上的 API、系统调用以及 proc/ 中的各种文件的所有实例都必须在 USER_HZ 中,因为用户空间应用程序依赖于 USER_HZ 值。
proc/timer_list 不受此限制,因为它更像是一个调试接口,而不是严格的内核 API 的一部分。我们真的很想看到真正的值,而不是为了这个目的而缩放的 USER_HZ 值。我希望这能回答你的问题。
因此,作为严格内核 API 一部分的所有实例都旨在USER_HZ
在暴露于用户空间之前通过扩展内核 jiffies,其他实例除外。
也可以看看
The Tick Rate:Linux Kernel Development Second Edition的 HZ部分,作者 Robert Love