linux-kernel - 磁盘 IO 上 swiotlb_unmap_sg_attrs() 中的 NULL 指针取消引用

Question

在使用 PCIe 块设备驱动程序读取或写入文件时，我遇到了一个我真的不明白的错误。我似乎在swiotlb_unmap_sg_attrs()中遇到了一个问题，它似乎正在对sg指针进行 NULL 取消引用，但我不知道这是从哪里来的，因为scatterlist我自己使用的唯一一个被分配为设备信息的一部分只要驱动程序这样做，结构就会持续存在。

有一个堆栈跟踪来解决这个问题。它在确切的细节上往往会有所不同，但它总是会崩溃swiotlb_unmap_sq_attrs()。

我认为我可能有锁定问题，因为我不确定如何处理 IO 函数周围的锁定。调用函数时已持有锁request，我在调用 IO 函数本身之前释放它，因为它们需要（MSI）IRQ 才能完成。IRQ 处理程序更新 IO 函数正在等待的“状态”值。当 IO 函数返回时，我将锁取回并返回到请求队列处理。

崩溃发生在blk_fetch_request()以下期间：

if (!__blk_end_request(req, res, bytes)){
    printk(KERN_ERR "%s next request\n", DRIVER_NAME);

    req = blk_fetch_request(q);
} else {
    printk(KERN_ERR "%s same request\n", DRIVER_NAME);
}

其中bytes由请求处理程序更新为 IO 的总长度（每个分散聚集段的总长度）。

score 0 · Accepted Answer

原来这是由于request函数的重新进入。因为我在中间解锁以允许 IRQ 进入，所以request可以再次调用该函数，获取锁（而原始请求处理程序正在等待 IO），然后错误的处理程序将获取 IRQ，一切都随着失败的 IO 堆栈。

我解决这个问题的方法是在请求函数的开头设置一个“忙”标志，在最后清除它，如果设置了，则在函数的开头立即返回：

static void mydev_submit_req(struct request_queue *q){
    struct mydevice *dev = q->queuedata;

    // We are already processing a request
    // so reentrant calls can take a hike
    // They'll be back
    if (dev->has_request)
        return;

    // We own the IO now, new requests need to wait
    // Queue lock is held when this function is called
    // so no need for an atomic set
    dev->has_request = 1; 

    // Access request queue here, while queue lock is held

    spin_unlock_irq(q->queue_lock);

    // Perform IO here, with IRQs enabled
    // You can't access the queue or request here, make sure 
    // you got the info you need out before you release the lock

    spin_lock_irq(q->queue_lock);

    // you can end the requests as needed here, with the lock held

    // allow new requests to be processed after we return
    dev->has_request = 0;

    // lock is held when the function returns
}

但是，我仍然不确定为什么我始终从中获得堆栈跟踪swiotlb_unmap_sq_attrs()。

linux-kernel - 磁盘 IO 上 swiotlb_unmap_sg_attrs() 中的 NULL 指针取消引用

1 回答 1

Related

Reference