Your choice of memory barrier is actually inappropriate in this situation.
That barrier (GL_ATOMIC_COUNTER_BARRIER_BIT
) would make changes to the atomic counter visible (e.g. flush caches and run shaders in a specific order), but what it does not do is make sure that any concurrent shaders are complete before you map, read and unmap your buffer.
Since your buffer is being mapped and read back, you do not need that barrier - that barrier is for coherency between shader passes. What you really need is to ensure all shaders that access your atomic counter are finished before you try to read data using a GL command, and for this you need GL_BUFFER_UPDATE_BARRIER_BIT
.
GL_BUFFER_UPDATE_BARRIER_BIT
:
Reads/writes via glBuffer(Sub)Data
, glCopyBufferSubData
, glProgramBufferParametersNV
, and glGetBufferSubData
, or to buffer object memory mapped by glMapBuffer(Range)
after the barrier will reflect data written by shaders prior to the barrier.
Additionally, writes via these commands issued after the barrier will wait on the completion of any shader writes to the same memory initiated prior to the barrier.
You may be thinking about barriers from the wrong perspective. The barrier you need depends on which type of operation the memory read needs to be coherent to.
I would suggest brushing up on the incoherent memory access usecases:
(1) Shader write/read between rendering commands
One Rendering Command writes incoherently, and the other reads. There is no need for coherent
(GLSL qualifier) here at all. Just use glMemoryBarrier
before issuing the reading rendering command, using the appropriate access bit.
(2) Shader writes, other OpenGL operations read
Again, coherent
is not necessary. You must use a glMemoryBarrier
before performing the read, using a bitfield that is appropriate to the reading operation of interest.
In case (1), the barrier you want is in-fact GL_ATOMIC_COUNTER_BARRIER_BIT
, because it will force strict memory and execution order rules between different shader passes that share the same atomic counter.
In case (2), the barrier you want is GL_BUFFER_UPDATE_BARRIER_BIT
. The "reading operation of interest" is glMapBuffer (...)
and as shown above, that is covered under GL_BUFFER_UPDATE_BARRIER_BIT
.
In your situation, you are reading the buffer back using the GL API. You need GL commands to wait for all pending shaders to finish writing (this does not happen automatically for incoherent memory access - image load/store, atomic counters, etc.). That is textbook case (2).