8

I found my MMIO read/write latency is unreasonably high. I hope someone could give me some suggestions.

In the kernel space, I wrote a simple program to read a 4 byte value in a PCIe device's BAR0 address. The device is a PCIe Intel 10G NIC and plugged-in at the PCIe x16 bus on my Xeon E5 server. I use rdtsc to measure the time between the beginning of the MMIO read and the end, a code snippet looks like this:

vaddr = ioremap_nocache(0xf8000000, 128); // addr is the BAR0 of the device  
rdtscl(init); 
ret = readl(vaddr); 
rmb(); 
rdtscl(end);

I'm expecting the elapsed time between (end, init) to be less than 1us, after all, the data traversing the PCIe data link should be only a few nanoseconds. However, my test results show at lease 5.5use to do a MMIO PCIe device read. I'm wondering whether this is reasonable. I change my code to remote the memory barrier (rmb) , but still get around 5 us latency.

This paper mentions about the PCIe latency measurement. Usually it's less than 1us. www.cl.cam.ac.uk/~awm22/.../miller2009motivating.pdf‎ Do I need to do any special configuration such as kernel or device to get lower MMIO access latency? or Does anyone has experiences doing this before?

4

2 回答 2

2

5 微秒很棒!在统计上循环执行此操作,您可能会发现更大的值。

有几个原因。BAR 通常是不可缓存和不可预取的 - 使用 pci_resource_flags() 检查你的。如果 BAR 被标记为可缓存,那么缓存一致性 - 确保所有 CPU 具有相同缓存值的过程可能是一个问题。

其次,读io永远是不贴的事情。CPU 必须停止,直到它获得在某些数据总线上通信的许可,并停止更多时间,直到数据到达所述总线。这条总线看起来像内存,但实际上并非如此,而且停顿可能是不可中断的忙碌等待,但它的非生产性却始终没有。因此,即使在您开始考虑任务抢占之前,我预计最坏情况的延迟会远高于 5us。

于 2016-04-06T14:31:47.753 回答
-1

如果 NIC 卡需要通过网络,可能通过交换机,从远程主机获取数据,5.5us 是一个合理的读取时间。如果是在本地 PCIe 设备中读取寄存器,应该小于 1us。我对 Intel 10G NIC 没有任何经验,但使用过 Infiniband 和定制卡。

于 2013-09-05T20:27:24.880 回答