linux - MMIO read/write latency

Question

I found my MMIO read/write latency is unreasonably high. I hope someone could give me some suggestions.

In the kernel space, I wrote a simple program to read a 4 byte value in a PCIe device's BAR0 address. The device is a PCIe Intel 10G NIC and plugged-in at the PCIe x16 bus on my Xeon E5 server. I use rdtsc to measure the time between the beginning of the MMIO read and the end, a code snippet looks like this:

vaddr = ioremap_nocache(0xf8000000, 128); // addr is the BAR0 of the device  
rdtscl(init); 
ret = readl(vaddr); 
rmb(); 
rdtscl(end);

I'm expecting the elapsed time between (end, init) to be less than 1us, after all, the data traversing the PCIe data link should be only a few nanoseconds. However, my test results show at lease 5.5use to do a MMIO PCIe device read. I'm wondering whether this is reasonable. I change my code to remote the memory barrier (rmb) , but still get around 5 us latency.

This paper mentions about the PCIe latency measurement. Usually it's less than 1us. www.cl.cam.ac.uk/~awm22/.../miller2009motivating.pdf‎ Do I need to do any special configuration such as kernel or device to get lower MMIO access latency? or Does anyone has experiences doing this before?

score 2 · Accepted Answer

5 微秒很棒！在统计上循环执行此操作，您可能会发现更大的值。

有几个原因。BAR 通常是不可缓存和不可预取的 - 使用 pci_resource_flags() 检查你的。如果 BAR 被标记为可缓存，那么缓存一致性 - 确保所有 CPU 具有相同缓存值的过程可能是一个问题。

其次，读io永远是不贴的事情。CPU 必须停止，直到它获得在某些数据总线上通信的许可，并停止更多时间，直到数据到达所述总线。这条总线看起来像内存，但实际上并非如此，而且停顿可能是不可中断的忙碌等待，但它的非生产性却始终没有。因此，即使在您开始考虑任务抢占之前，我预计最坏情况的延迟会远高于 5us。

score -1 · Accepted Answer

如果 NIC 卡需要通过网络，可能通过交换机，从远程主机获取数据，5.5us 是一个合理的读取时间。如果是在本地 PCIe 设备中读取寄存器，应该小于 1us。我对 Intel 10G NIC 没有任何经验，但使用过 Infiniband 和定制卡。

linux - MMIO read/write latency

2 回答 2

Related

Reference