Is there any data on AVX2 gather latency?
(for instance a _mm256_i32gather_ps instruction accessing a single cache line)
Is there any data on AVX2 gather latency?
(for instance a _mm256_i32gather_ps instruction accessing a single cache line)
实际上,这实际上取决于硬件。如果您查看 Agner Fog 的指令表,您会发现 Zen1 和 Zen2 没有列出延迟,但 VGATHERDPS 的吞吐量倒数为 13-20 和 9-16。对于英特尔处理器,我们有:
                     xmm                 ymm
Processor    throughput latency  throughput latency
-------------------------------------------------------
Haswell          9                    12
Broadwell        6                     7
Skylake          4         12          5       13
SkylakeX         4         12          5       13
Coffee Lake      4         12          5       13
此外,英特尔的网站不再列出 AVX2 收集指令的吞吐量/延迟,但有一些针对 AVX512。