我遇到了内核启动问题。我有一个使用一个大内核的程序。由于同步问题,现在我需要将其分成两部分。第一个内核做了一些初始化工作,并获得了传递给第二个内核的参数子集。只运行第一个内核工作正常。由于缺少初始化,仅运行第二个内核在执行时失败,但内核本身已启动。连续运行这两个内核会使第二个内核因“无效参数”错误而失败。如有必要,我将提供代码,但现在无法弄清楚它可能有什么帮助。提前致谢。
编辑:这里请求的启动代码:
void DeviceManager::integrate(){
assert(hostArgs->neighborhoodsSize > 0);
size_t maxBlockSize;
size_t blocks;
size_t threadsPerBlock;
// init patch kernel
maxBlockSize = 64;
blocks = (hostArgs->patchesSize /maxBlockSize);
if(0 != hostArgs->patchesSize % maxBlockSize){
blocks++;
}
threadsPerBlock = maxBlockSize;
std::cout << "blocks: " << blocks << ", threadsPerBlock: " << threadsPerBlock << std::endl;
initPatchKernel<CUDA_MAX_SPACE_DIMENSION><<<blocks,threadsPerBlock>>>(devicePatches, hostArgs->patchesSize);
cudaDeviceSynchronize();
//calc kernel
maxBlockSize = 64;
blocks = (hostArgs->neighborhoodsSize /maxBlockSize);
if(0 != hostArgs->neighborhoodsSize % maxBlockSize){
blocks++;
}
threadsPerBlock = maxBlockSize;
size_t maxHeapSize = hostArgs->patchesSize * (sizeof(LegendreSpace) + sizeof(LinearSpline)) + hostArgs->neighborhoodsSize * (sizeof(ReactionDiffusionCCLinearForm) + sizeof(ReactionDiffusionCCBiLinearForm));
std::cout << "maxHeapSize: " << maxHeapSize << std::endl;
cudaDeviceSetLimit(cudaLimitMallocHeapSize, maxHeapSize);
std::cout << "blocks: " << blocks << ", threadsPerBlock: " << threadsPerBlock << std::endl;
integrateKernel<CUDA_MAX_SPACE_DIMENSION><<<blocks,threadsPerBlock>>>(deviceNeighborhoods, hostArgs->neighborhoodsSize, devicePatches, hostArgs->patchesSize, hostArgs->biLinearForms, hostArgs->linearForms, deviceRes);
cudaDeviceSynchronize();
}
内存传输和分配应该不是问题,因为它在仅使用一个内核时工作。
编辑 2:在通过包装函数在调试模式下构建时,我在每次内核调用后检查错误。因此,在每次内核调用之后,都会执行以下操作:
cudaError_t cuda_result_code = cudaGetLastError();
if (cuda_result_code!=cudaSuccess) {
fprintf("message: %s\n",cudaGetErrorString(cuda_result_code));
}
很抱歉没有提到这一点,包装器不是我的,很抱歉没有粘贴这个技巧。失败前的输出如下:
blocks: 1, threadsPerBlock: 64
maxHeapSize: 4480
blocks: 1, threadsPerBlock: 64
message: invalid argument