它使用 dma 而不是 mmio。
这是 Keith Busch 的回答:
一般来说,nvme 驱动程序通过 MMIO 写入特定的 nvme 寄存器来通知控制器新命令。nvme 控制器使用 DMA 从主机内存中获取这些命令。
该描述的一个例外是 nvme 控制器是否支持带有 SQE 的 CMB,但它们并不常见。如果您有这样的控制器,驱动程序将使用 MMIO 将命令直接写入控制器内存,而不是让控制器从主机内存中对它们进行 DMA。你知道你有没有这样的控制器?
与您的“dd”命令相关的数据传输将始终使用 DMA。
以下是 ftrace 输出:
调用堆栈之前nvme_map_data
:
# entries-in-buffer/entries-written: 376/376 #P:2
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID TGID CPU# |||| TIMESTAMP FUNCTION
# | | | | |||| | |
kworker/u4:0-379 (-------) [000] ...1 3712.711523: nvme_map_data <-nvme_queue_rq
kworker/u4:0-379 (-------) [000] ...1 3712.711533: <stack trace>
=> nvme_map_data
=> nvme_queue_rq
=> blk_mq_dispatch_rq_list
=> __blk_mq_do_dispatch_sched
=> __blk_mq_sched_dispatch_requests
=> blk_mq_sched_dispatch_requests
=> __blk_mq_run_hw_queue
=> __blk_mq_delay_run_hw_queue
=> blk_mq_run_hw_queue
=> blk_mq_sched_insert_requests
=> blk_mq_flush_plug_list
=> blk_flush_plug_list
=> blk_mq_submit_bio
=> __submit_bio_noacct_mq
=> submit_bio_noacct
=> submit_bio
=> submit_bh_wbc.constprop.0
=> __block_write_full_page
=> block_write_full_page
=> blkdev_writepage
=> __writepage
=> write_cache_pages
=> generic_writepages
=> blkdev_writepages
=> do_writepages
=> __writeback_single_inode
=> writeback_sb_inodes
=> __writeback_inodes_wb
=> wb_writeback
=> wb_do_writeback
=> wb_workfn
=> process_one_work
=> worker_thread
=> kthread
=> ret_from_fork
的调用图nvme_map_data
:
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) | nvme_map_data [nvme]() {
0) | __blk_rq_map_sg() {
0) + 15.600 us | __blk_bios_map_sg();
0) + 19.760 us | }
0) | dma_map_sg_attrs() {
0) + 62.620 us | dma_direct_map_sg();
0) + 66.520 us | }
0) | nvme_pci_setup_prps [nvme]() {
0) | dma_pool_alloc() {
0) | _raw_spin_lock_irqsave() {
0) 1.880 us | preempt_count_add();
0) 5.520 us | }
0) | _raw_spin_unlock_irqrestore() {
0) 1.820 us | preempt_count_sub();
0) 5.260 us | }
0) + 16.400 us | }
0) + 23.500 us | }
0) ! 150.100 us | }
nvme_pci_setup_prps
是 nvme 做 dma 的一种方法:
NVMe devices transfer data to and from system memory using Direct Memory Access (DMA). Specifically, they send messages across the PCI bus requesting data transfers. In the absence of an IOMMU, these messages contain physical memory addresses. These data transfers happen without involving the CPU, and the MMU is responsible for making access to memory coherent.
NVMe devices also may place additional requirements on the physical layout of memory for these transfers. The NVMe 1.0 specification requires all physical memory to be describable by what is called a PRP list. To be described by a PRP list, memory must have the following properties:
The memory is broken into physical 4KiB pages, which we'll call device pages.
The first device page can be a partial page starting at any 4-byte aligned address. It may extend up to the end of the current physical page, but not beyond.
If there is more than one device page, the first device page must end on a physical 4KiB page boundary.
The last device page begins on a physical 4KiB page boundary, but is not required to end on a physical 4KiB page boundary.
https://spdk.io/doc/memory.html