synchronization - 如何在 DirectX / Direct3D 12 中使用栅栏同步 CPU 和 GPU？

Question

我开始学习 Direct3D 12 并且难以理解 CPU-GPU 同步。据我了解，栅栏（ID3D12Fence）不过是用作计数器的 UINT64（无符号长长）值。但它的方法让我感到困惑。以下是 D3D12 示例的部分源代码。( https://github.com/d3dcoder/d3d12book )

void D3DApp::FlushCommandQueue()
{
    // Advance the fence value to mark commands up to this fence point.
    mCurrentFence++;

    // Add an instruction to the command queue to set a new fence point.  Because we 
    // are on the GPU timeline, the new fence point won't be set until the GPU finishes
    // processing all the commands prior to this Signal().
    ThrowIfFailed(mCommandQueue->Signal(mFence.Get(), mCurrentFence));

    // Wait until the GPU has completed commands up to this fence point.
    if(mFence->GetCompletedValue() < mCurrentFence)
    {
        HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);

        // Fire event when GPU hits current fence.  
        ThrowIfFailed(mFence->SetEventOnCompletion(mCurrentFence, eventHandle));

        // Wait until the GPU hits current fence event is fired.
        WaitForSingleObject(eventHandle, INFINITE);
        CloseHandle(eventHandle);
    }
}

据我了解，这部分正在尝试“刷新”命令队列，这基本上是让 CPU 等待 GPU 直到达到给定的“围栏值”，以便 CPU 和 GPU 具有相同的围栏值。

问：如果这个 Signal() 是一个让 GPU 更新给定 ID3D12Fence 内的栅栏值的函数，为什么需要那个 mCurrentFence 值？

根据 Microsoft Doc，它说“将栅栏更新为指定值”。什么指定值？我需要的是“获取最后完成的命令列表值”，而不是设置或指定。这个指定的值是干什么用的？

对我来说，它似乎必须像

// Suppose mCurrentFence is 1 after submitting 1 command list (Index 0), and the thread reached to here for the FIRST time
ThrowIfFailed(mCommandQueue->Signal(mFence.Get()));
// At this point Fence value inside mFence is updated
if (m_Fence->GetCompletedValue() < mCurrentFence)
{
...
}

如果 m_Fence->GetCompletedValue() 为 0，

如果 (0 < 1)

GPU 还没有操作命令列表（索引 0），那么 CPU 必须等到 GPU 跟进。那么调用 SetEventOnCompletion、WaitForSingleObject 等就有意义了。

如果 (1 < 1)

GPU 已完成命令列表（索引 0），因此 CPU 无需等待。

在执行命令列表的地方增加 mCurrentFence。

mCommandQueue->ExecuteCommandLists(_countof(cmdsLists), cmdsLists);
mCurrentFence++;

score 3 · Accepted Answer

mCommandQueue->Signal(mFence.Get(), mCurrentFence)mCurrentFence一旦命令队列中的所有先前排队的命令都已执行，就将栅栏值设置为。在这种情况下，“指定值”是 mCurrentFence。

开始时，fence 和 mCurrentFence 的值都设置为 0。接下来，mCurrentFence 设置为 1。然后mCommandQueue->Signal(mFence.Get(), 1)，一旦在该队列上执行了所有操作，我们就会将栅栏设置为 1。最后我们调用mFence->SetEventOnCompletion(1, eventHandle)followingWaitForSingleObject等到fence 设置为1。

将 1 替换为 2 以进行下一次迭代，依此类推。

请注意，这mCommandQueue->Signal是一个非阻塞操作，不会立即设置栅栏的值，只有在所有其他 gpu 命令都已执行之后。您可以假设m_Fence->GetCompletedValue() < mCurrentFence在此示例中始终如此。

为什么需要 mCurrentFence 值？

我想这不是必须的，但是您可以通过这种方式跟踪栅栏值来避免额外的 API 调用。在这种情况下，您还可以这样做：

// retrieve last value of the fence and increment by one (Additional API call)
auto nextFence = mFence->GetCompletedValue() + 1;
ThrowIfFailed(mCommandQueue->Signal(mFence.Get(), nextFence));

// Wait until the GPU has completed commands up to this fence point.
if(mFence->GetCompletedValue() < nextFence)
{
    HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);  
    ThrowIfFailed(mFence->SetEventOnCompletion(nextFence, eventHandle));
    WaitForSingleObject(eventHandle, INFINITE);
    CloseHandle(eventHandle);
}

score 1 · Accepted Answer

作为对菲利克斯回答的补充：

跟踪栅栏值（例如mCurrentFence）对于等待命令队列中更具体的点很有用。

例如，假设我们正在使用此设置：

ComPtr<ID3D12CommandQueue> queue;
ComPtr<ID3D12Fence> queueFence;
UINT64 fenceVal = 0;

UINT64 incrementFence()
{
    fenceVal++;
    queue->Signal(queueFence.Get(), fenceVal); // CHECK HRESULT
    return fenceVal;
}

void waitFor(UINT64 fenceVal, DWORD timeout = INFINITE)
{
    if (queueFence->GetCompletedValue() < fenceVal)
    {
        queueFence->SetEventOnCompletion(fenceVal, fenceEv); // CHECK HRESULT
        WaitForSingleObject(fenceEv, timeout);
    }
}

然后我们可以执行以下操作（伪）：

SUBMIT COMMANDS 1
cmds1Complete = incrementFence();
    .
    . <- CPU STUFF
    .
SUBMIT COMMANDS 2
cmds2Complete = incrementFence();
    .
    . <- CPU STUFF
    .
waitFor(cmds1Complete)
    .
    . <- CPU STUFF (that needs COMMANDS 1 to be complete,
      but COMMANDS 2 is NOT required to be completed [but also could be])
    .
waitFor(cmds2Complete)
    .
    . <- EVERYTHING COMPLETE
    .

由于我们跟踪fenceVal我们还可以有一个flush只等待跟踪的函数fenceVal（与从 incrementFence 返回的值相反），这本质上就是你所拥有的，FlushCommandQueue因为它内联信号，它将始终是最新的值（这就是为什么正如 Felix 所说，它只是保存了一个 API 调用）：

void flushCmdQueue()
{
    waitFor(incrementFence());
}

这个例子比最初的问题要复杂一些，但是，我认为在询问mCurrentFence.

synchronization - 如何在 DirectX / Direct3D 12 中使用栅栏同步 CPU 和 GPU？

2 回答 2

Related

Reference