I am trying to translate CUDA code into OpenCL and right now I am stuck with these functions/variables:
__syncthreads()
blockIdx.x
I am trying to translate CUDA code into OpenCL and right now I am stuck with these functions/variables:
__syncthreads()
blockIdx.x
Actually I found it by my own! Here is a useful article: http://www.netlib.org/utk/people/JackDongarra/PAPERS/parcocudaopencl.pdf
The answer is: for the __syncthreads() use barrier(CLK_LOCAL_MEM_FENCE); for blockIdx.x use get_group_id(0)!
__syncthreads()
-> barrier(_)
but make sure to understand the difference between barrier(CLK_LOCAL_MEM_FENCE)
and barrier(CLK_GLOBAL_MEM_FENCE)
check this question or this documentation for more info.
blockIdx.x
-> get_group_id(0)
which will give you the first/x dimension id of the group/block
There are many pages on the web that can help you for porting CUDA to OpenCl (for example here). I want just to remark, as it point out here for the "barrier", that there are barrier(CLK_LOCAL_MEM_FENCE)
and barrier(CLK_GLOBAL_MEM_FENCE)
: mainly the difference is that the first one ensuree correct ordering of memory operations when you are using the local (shared in CUDA) memory and the second when you are operating on the global memory. Be sure to use the correct one for your case.