Is there any data on AVX2 gather latency?
(for instance a _mm256_i32gather_ps instruction accessing a single cache line)
Is there any data on AVX2 gather latency?
(for instance a _mm256_i32gather_ps instruction accessing a single cache line)
实际上,这实际上取决于硬件。如果您查看 Agner Fog 的指令表,您会发现 Zen1 和 Zen2 没有列出延迟,但 VGATHERDPS 的吞吐量倒数为 13-20 和 9-16。对于英特尔处理器,我们有:
xmm ymm
Processor throughput latency throughput latency
-------------------------------------------------------
Haswell 9 12
Broadwell 6 7
Skylake 4 12 5 13
SkylakeX 4 12 5 13
Coffee Lake 4 12 5 13
此外,英特尔的网站不再列出 AVX2 收集指令的吞吐量/延迟,但有一些针对 AVX512。