0

最近我正在尝试调试一个 nvme 超时问题:

# dd if=/dev/urandom of=/dev/nvme0n1 bs=4k count=1024000 
nvme nvme0: controller is down; will reset: CSTS=0x3,
PCI_STATUS=0x2010
nvme nvme0: Shutdown timeout set to 8 seconds
nvme nvme0: 1/0/0 default/read/poll queues 
nvme nvme0: I/O 388 QID 1 timeout, disable controller
blk_update_request: I/O error, dev nvme0n1, sector 64008 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
......

经过一番挖掘,我发现根本原因是 pcie-controller 的 range dts 属性,该属性用于 pio/outbound 映射:

non-prefetch mmio     dma/pci/bus address  CPU/physical address        size
<0x02000000           0x00 0x08000000 0x20 0x04000000           0x00 0x04000000>; dd timeout
<0x02000000           0x00 0x04000000 0x20 0x04000000           0x00 0x04000000>; dd ok

如上图,唯一的区别就是总线地址,一个是0x8000000,一个是0x04000000,前者在dd的时候会导致nvme超时,而后者不会。PCIE IP是cadence的,控制器的平台驱动是https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/cadence/pcie -cadence-plat.c

所以,我的问题是

  1. 为什么总线地址0x08000000会导致 nvme 超时?
  2. 当 dd 时,nvme 超时如何受到 pcie 控制器“范围”dts 属性的影响
4

0 回答 0