这些指令是用于建议 CPU 尝试将高速缓存行预取到高速缓存中的提示。因为它们是提示,CPU 可以完全忽略它们。
如果 CPU 确实支持它们,那么 CPU 将尝试预取,但如果涉及 TLB 未命中,CPU 将放弃(并且不会预取)。这是大多数人弄错的地方(例如,未能进行“预加载”,您插入一个虚拟读取以强制 TLB 加载,这样就不会阻止预取工作)。
预取的数据量为 32 字节或更多,取决于 CPU 等。您可以使用 CPUID 来确定实际大小(CPUID 函数 0x00000004,EBX 位 0 到 31 中返回的“系统一致性行大小”)。
如果你预取太晚它没有帮助,如果你太早预取数据可以在使用之前从缓存中逐出(这也没有帮助)。英特尔的“IA-32 英特尔架构优化参考手册”中有一个附录描述了如何计算何时预取,称为“预取调度距离的数学”,您可能应该阅读。
Also don't forget that prefetching can decrease performance (e.g. cause data that is needed to be evicted to make room) and that if you don't prefetch anything the CPU has a hardware prefetcher that will probably do it for you anyway. You should probably also read about how this hardware prefetcher works (and when it doesn't). For example, for sequential reads (e.g. memcmp()
) the hardware prefetcher does it for you and using explicit prefetches is mostly a waste of time. It's probably only worth bothering with explicit prefetches for "random" (non-sequential) accesses that the CPU's hardware prefetcher can't/won't predict.