concurrency - GLSL SpinLock 仅在大多数情况下有效

Question

我已经使用 GLSL 自旋锁（受此启发）实现了深度剥离算法。在下面的可视化中，请注意深度剥离算法在整体上是如何正确运行的（第一层左上角，第二层右上角，第三层左下角，第四层右下角）。四个深度层存储在单个 RGBA 纹理中。

不幸的是，自旋锁有时不能防止错误——你可以看到小的白色斑点，特别是在第四层。第二层飞船的机翼上也有一个。这些散斑在每一帧都不同。

在此处输入图像描述

在我的 GLSL 自旋锁中，当要绘制片段时，片段程序会原子地读取锁定值并将其写入单独的锁定纹理，直到出现 0，表示锁已打开。在实践中，我发现程序必须是并行的，因为如果两个线程在同一个像素上，warp 就无法继续（一个必须等待，而另一个继续，GPU 线程 warp 中的所有线程必须同时执行）。

我的片段程序看起来像这样（添加了注释和间距）：

#version 420 core

//locking texture
layout(r32ui) coherent uniform uimage2D img2D_0;
//data texture, also render target
layout(RGBA32F) coherent uniform image2D img2D_1;

//Inserts "new_data" into "data", a sorted list
vec4 insert(vec4 data, float new_data) {
    if      (new_data<data.x) return vec4(      new_data,data.xyz);
    else if (new_data<data.y) return vec4(data.x,new_data,data.yz);
    else if (new_data<data.z) return vec4(data.xy,new_data,data.z);
    else if (new_data<data.w) return vec4(data.xyz,new_data      );
    else                      return data;
}

void main() {
    ivec2 coord = ivec2(gl_FragCoord.xy);

    //The idea here is to keep looping over a pixel until a value is written.
    //By looping over the entire logic, threads in the same warp aren't stalled
    //by other waiting threads.  The first imageAtomicExchange call sets the
    //locking value to 1.  If the locking value was already 1, then someone
    //else has the lock, and can_write is false.   If the locking value was 0,
    //then the lock is free, and can_write is true.  The depth is then read,
    //the new value inserted, but only written if can_write is true (the
    //locking texture was free).  The second imageAtomicExchange call resets
    //the lock back to 0.

    bool have_written = false;
    while (!have_written) {
        bool can_write = (imageAtomicExchange(img2D_0,coord,1u) != 1u);

        memoryBarrier();

        vec4 depths = imageLoad(img2D_1,coord);
        depths = insert(depths,gl_FragCoord.z);

        if (can_write) {
            imageStore(img2D_1,coord,depths);
            have_written = true;
        }

        memoryBarrier();

        imageAtomicExchange(img2D_0,coord,0);

        memoryBarrier();
    }
    discard; //Already wrote to render target with imageStore
}

我的问题是为什么会出现这种斑点行为？我希望自旋锁 100% 的时间都在工作！它可能与我放置 memoryBarrier() 的位置有关吗？

score 3 · Accepted Answer

作为参考，这里是经过测试可在 GTX670 上的 Nvidia 驱动程序 314.22 和 320.18 上运行的锁定代码。请注意，如果代码被重新排序或重写为逻辑等效的代码，则会触发现有的编译器优化错误（请参阅下面的注释。）注意在下面我使用无绑定图像引用。

// sem is initialized to zero
coherent uniform layout(size1x32) uimage2D sem;

void main(void)
{
    ivec2 coord = ivec2(gl_FragCoord.xy);

    bool done = false;
    uint locked = 0;
    while(!done)
    {
     // locked = imageAtomicCompSwap(sem, coord, 0u, 1u); will NOT work
        locked = imageAtomicExchange(sem, coord, 1u);
        if (locked == 0)
        {
            performYourCriticalSection();

            memoryBarrier();

            imageAtomicExchange(sem, coord, 0u);

            // replacing this with a break will NOT work
            done = true;
        }
    }

    discard;
}

score 2 · Accepted Answer

“imageAtomicExchange(img2D_0,coord,0);” 需要在 if 语句中，因为即使对于没有它的线程，它也会重置锁变量！改变这个可以修复它。

concurrency - GLSL SpinLock 仅在大多数情况下有效

2 回答 2

Related

Reference