Sorry for start new question, I tried to use histogram.cu(in thrust example) to do the concentration, but its(brute force) time consuming if the number of particles are over 100k(takes about 5secs). What I want to do is calculate how many particles in each cell of the domain? assume cell size is 1 X 1 X 1, domain size is 100 X 100 X 100, there are 50k particles pos are (50, 50, 50),20k particles pos are (15, 15, 15),30k particles pos are (20, 20, 20). so cell(50, 50, 50) has 50k particles, cell(15, 15, 15)has 20k, cell(20, 20, 20)has 30k, no particles in other cells
my code is this posPtr* is an array of particles, device_cons* gonna be the output of concetration of each cell. But it doesnt work well, please help.(I think my shared memory control has some problem)
__global__ void concentration_kernel(float3* posPtr, uint* device_cons)
{
__shared__ uint cache[256];
uint x = threadIdx.x + blockIdx.x * blockDim.x;
uint y = threadIdx.y + blockIdx.y * blockDim.y;
uint offset = x + y * blockDim.x * gridDim.x;
float3 posf3 = posPtr[offset];//make_float3(43.5,55,0.66);//
uint cellIndex = (uint)(posf3.z+1)*153*110 + (uint)(posf3.y)*153 + (uint)posf3.x;
cache[threadIdx.x] = device_cons[cellIndex];
__syncthreads();
uint a = cache[threadIdx.x];
a++;
cache[threadIdx.x] = a;
__syncthreads();
device_cons[cellIndex] = cache[threadIdx.x];
}
thanks!
for better understanding here is my c++ code
for(uint i=0; i<numParticles; i++)
{
float3 posf3 = dev_pos[i];
uint cellIndex = (uint)(posf3.z)*153*110 +
(uint)(posf3.y)*153 + (uint)posf3.x;
dev_con[cellIndex]++;
}
the domain size is(152, 110, 30) thats why I do
uint cellIndex = (uint)(posf3.z)*153*110 +
(uint)(posf3.y)*153 + (uint)posf3.x;
to get the cellindex of particles at Thanks
I figured it out.... use atomicAdd()...