我有以下问题。我正在尝试将共享数组划分为更小的数组,然后在其他设备函数中使用这些数组。在我的内核函数中,
for (int block_x = 0; block_x < blockDim.x; block_x++) {
for (int block_y = 0; block_y < blockDim.y; block_y++) {
//set up shared memory block
extern __shared__ vec3f share[];
vec3f *sh_pos = share;
vec3f *sh_velocity = &sh_pos[blockDim.x*blockDim.y];
vec3f *sh_density = &sh_velocity[blockDim.x*blockDim.y];
vec3f *sh_pressure = &sh_density[blockDim.x*blockDim.y];
//index by 2d threadidx's
unsigned int index = (block_x * blockDim.x + threadIdx.x) + blockDim.x * gridDim.x * (block_y * blockDim.y + threadIdx.y);
sh_pos[blockDim.x * threadIdx.x + threadIdx.y] = oldParticles[index].position();
sh_velocity[blockDim.x * threadIdx.x + threadIdx.y] = oldParticles[index].velocity();
sh_pressure[blockDim.x * threadIdx.x + threadIdx.y].x = oldParticles[index].pressure();
sh_density[blockDim.x * threadIdx.x + threadIdx.y].x = oldParticles[index].density();
__syncthreads();
d_force_pressure(oldParticles[arr_pos],c_kernel_support);
__syncthreads();
}
}
据我所知,所有“sh_”数组都填充了零,而不是我想要的值。我不能说我做错了什么。请注意, vec3f 是浮点向量,就像 float3 数据类型一样。另外,我不认为我可以混合使用密度和压力的浮点数,所以我只是将它们制作成矢量并使用单个组件。然后,例如我的 d_force_pressure 函数是,
__device__ void d_force_pressure(particle& d_particle, float h) {
extern __shared__ vec3f share[];
vec3f *sh_pos = share;
vec3f *sh_velocity = &sh_pos[blockDim.x*blockDim.y];
vec3f *sh_density = &sh_velocity[blockDim.x*blockDim.y];
vec3f *sh_pressure = &sh_density[blockDim.x*blockDim.y];
for (int i = 0; i < blockDim.x * blockDim.y; i++) {
vec3f diffPos = d_particle.position() - sh_pos[i];
d_particle.force() += GradFuncion(diffPos,h) * -1.0 * c_particle_mass * (d_particle.pressure()+sh_pressure[i].x)/(2.0*sh_density[i].x);
}
}
调用此函数后,我得到 NaN,因为我除以零(sh_density[i].x
据我所知,为 0)。这通常也是加载共享内存的正确方法吗?
内核被调用
dim3 block(BLOCK_SIZE,BLOCK_SIZE,1);
dim3 grid((int)ceil(sqrt(float(max_particles)) / (float(block.x*block.y))), (int)ceil(sqrt(float(max_particles)) / (float(block.x*block.y))), 1);
int sharedMemSize = block.x*block.y*4*sizeof(vec3f);
force_kernel<<< grid,block,sharedMemSize >>>(particle_ptrs[1],particle_ptrs[0],time_step);