c++ - 跨多个 gpu 同步原子计数器

Question

atomic_uint我在与动态绑定的计算着色器中使用原子计数器GL_ATOMIC_COUNTER_BUFFER（与本opengl-atomic-counter 教程 lighthouse3d类似）。

我在粒子系统中使用原子计数器来检查所有粒子是否已达到条件；我希望看到counter==numParticles所有粒子何时都在正确的位置。

我每帧都映射缓冲区并检查原子计数器是否计算了所有粒子：

GLuint *ptr = (GLuint *) glMapBuffer( GL_ATOMIC_COUNTER_BUFFER, GL_READ_ONLY );
GLuint particleCount = ptr[ 0 ];
glUnmapBuffer( GL_ATOMIC_COUNTER_BUFFER );
if( particleCount == numParticles() ){ // do stuff }

在单个 GPU 主机上，代码工作正常并且particleCount总是到达numParticles()，但在多 GPU 主机上particleCount永远不会到达numParticles()。

我可以直观地检查条件是否已达到并且测试应该为真，但是particleCount 正在上下改变每一帧，但从未达到numParticles()。

GL_ATOMIC_COUNTER_BARRIER_BIT在我取消映射之前，我已经尝试了一个 opengl 内存屏障particleCount：

glMemoryBarrier(GL_ATOMIC_COUNTER_BARRIER_BIT);
GLuint *ptr = (GLuint *) glMapBuffer( GL_ATOMIC_COUNTER_BUFFER, GL_READ_ONLY );
GLuint particleCount = ptr[ 0 ];
glUnmapBuffer( GL_ATOMIC_COUNTER_BUFFER );
if( particleCount == m_particleSystem->numParticles() )
{ // do stuff }

在增加计算着色器中的计数器之前，我尝试了一个 glsl 屏障：

memoryBarrierAtomicCounter();
atomicCounterIncrement( particleCount );

但原子计数器似乎没有跨设备同步。

同步以便原子计数器与多个设备一起使用的正确方法是什么？

score 2 · Accepted Answer

Your choice of memory barrier is actually inappropriate in this situation.

That barrier (GL_ATOMIC_COUNTER_BARRIER_BIT) would make changes to the atomic counter visible (e.g. flush caches and run shaders in a specific order), but what it does not do is make sure that any concurrent shaders are complete before you map, read and unmap your buffer.

Since your buffer is being mapped and read back, you do not need that barrier - that barrier is for coherency between shader passes. What you really need is to ensure all shaders that access your atomic counter are finished before you try to read data using a GL command, and for this you need GL_BUFFER_UPDATE_BARRIER_BIT.

GL_BUFFER_UPDATE_BARRIER_BIT:

Reads/writes via glBuffer(Sub)Data, glCopyBufferSubData, glProgramBufferParametersNV, and glGetBufferSubData, or to buffer object memory mapped by glMapBuffer(Range) after the barrier will reflect data written by shaders prior to the barrier.

Additionally, writes via these commands issued after the barrier will wait on the completion of any shader writes to the same memory initiated prior to the barrier.

You may be thinking about barriers from the wrong perspective. The barrier you need depends on which type of operation the memory read needs to be coherent to.

I would suggest brushing up on the incoherent memory access usecases:

(1) Shader write/read between rendering commands

One Rendering Command writes incoherently, and the other reads. There is no need for coherent^{(GLSL qualifier)} here at all. Just use glMemoryBarrier before issuing the reading rendering command, using the appropriate access bit.

(2) Shader writes, other OpenGL operations read

Again, coherent is not necessary. You must use a glMemoryBarrier before performing the read, using a bitfield that is appropriate to the reading operation of interest.

In case (1), the barrier you want is in-fact GL_ATOMIC_COUNTER_BARRIER_BIT, because it will force strict memory and execution order rules between different shader passes that share the same atomic counter.

In case (2), the barrier you want is GL_BUFFER_UPDATE_BARRIER_BIT. The "reading operation of interest" is glMapBuffer (...) and as shown above, that is covered under GL_BUFFER_UPDATE_BARRIER_BIT.

In your situation, you are reading the buffer back using the GL API. You need GL commands to wait for all pending shaders to finish writing (this does not happen automatically for incoherent memory access - image load/store, atomic counters, etc.). That is textbook case (2).

c++ - 跨多个 gpu 同步原子计数器

1 回答 1

Your choice of memory barrier is actually inappropriate in this situation.

I would suggest brushing up on the incoherent memory access usecases:

Related

Reference