问题标签 [gpu-atomics]
For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.
gpgpu - Vulkan subgroupBarrier does not synchronize invokations
I have a somewhat complex procedure that contains nested loop and a subgroupBarrier
.
In a simplified form it looks like
Overall the procedure is correct and it does what's expected from it. All subgroup threads always eventually reach the end condition. However, in my logs I see
And it's not just the matter of logs being displayed out of order. I perform atomic addition and it seems to be wrong too. I need all threads to finish all their atomic operations before printing Finish!
. If the subgroupBarrier()
worked correctly, it should print 4
, but in my case it prints 3
. I've been mostly following this tutorial
https://www.khronos.org/blog/vulkan-subgroup-tutorial
and it says that
void subgroupBarrier()
performs a full memory and execution barrier - basically when an invocation returns fromsubgroupBarrier()
we are guaranteed that every invocation executed the barrier before any return, and all memory writes by those invocations are visible to all invocations in the subgroup.
Interestingly I tried changing if(gl_SubgroupInvocationID.x==0)
to other numbers. For example if(gl_SubgroupInvocationID.x==3)
yields
So it seems like the subgroupBarrier()
is entirely ignored.
Could the nested loop be the cause of the problem or is it something else?
Edit:
I provide here more detailed code
Basically what this code does is equivalent to
The reason why my code looks so convoluted is because I wrote it in a way that is more parallelizable and tries to minimize the number of inactive threads (considering that usually threads in the same subgroup have to execute the same instruction).
I also added a bunch more of debug prints and one more barrier just to see what happens. Here are the logs that i got
cuda - CUDA atomicAdd_block 未定义
根据 CUDA Programming Guide,“原子函数仅相对于由特定集合的线程执行的其他操作是原子的......块范围的原子:对于当前程序中的所有 CUDA 线程在与当前线程块相同的线程块中执行的原子线程。这些以 _block 为后缀,例如,atomicAdd_block
“
但是,atomicAdd_block
当我的代码使用atomicAdd
. 我应该添加或链接到任何标题或库吗?
cuda - 是否有适当的 CUDA atomicLoad 功能?
我遇到了 CUDA atomic API 没有 atomicLoad 功能的问题。在stackoverflow上搜索后,我发现了以下CUDA atomicLoad的实现
但看起来此功能在以下示例中无法正常工作:
如果您使用 atomicLoad 取消注释该部分,则应用程序将卡住...
也许我错过了什么?是否有适当的方法来加载原子修改的变量?
PS:我知道有cuda::atomic
实现,但是我的硬件不支持这个 API
multithreading - 不同的并行线程写入 Openmp 中的相同内存位置
我有一个来自存储库的 OpenMP 代码。碰巧有两个线程根据数据写入同一位置的情况(具体来说是 BFS 图算法)。从功能上讲,它不会影响最后写入的线程,因为相同的值被写入该位置。但他们没有使用任何原子。这是代码的一部分:
此外,代码中没有指定原子或私有。但是我得到了正确的输出。我想知道线程如何管理这种冲突。它如何转换为汇编代码,是否涉及隐式原子?我检查了汇编代码并没有遇到栅栏类型的说明。或者如果它是相同的值,线程实际上是否可以同时写入相同的位置?
谢谢