exception - cudaGetExportTable（CUDA 运行时库）中抛出异常'cudaError_enum'？

Question

我正在使用 DDT 调试基于 MPI 的 CUDA 程序。当 CUDA 运行时库（libcudart）在（未记录的）函数中抛出异常时，我的代码中止cudaGetExportTable，当从我的代码中调用cudaMalloc和cudaThreadSynchronize（更新：使用cudaDeviceSynchronize给出相同的错误）时。

为什么 libcudart 会抛出异常（我使用的是 C API，而不是 C++ API），然后我才能在我的代码中使用它的cudaError_t返回值或 with检测到它CHECKCUDAERROR？

（我正在使用适用于 Linux 的 CUDA 4.2 SDK。）

输出：

Process 9: terminate called after throwing an instance of 'cudaError_enum'
Process 9: terminate called recursively

Process 20: terminate called after throwing an instance of 'cudaError'
Process 20: terminate called recursively

我的代码：

cudaThreadSynchronize();
CHECKCUDAERROR("cudaThreadSynchronize()");

其他代码片段：

const size_t t;  // from argument to function
void* p=NULL;
const cudaError_t r=cudaMalloc(&p, t);
if (r!=cudaSuccess) {
    ERROR("cudaMalloc failed.");
}

部分回溯：

Process 9:
cudaDeviceSynchronize()
-> cudaGetExportTable()
   -> __cxa_throw

Process 20:
cudaMalloc()
-> cudaGetExportTable()
   -> cudaGetExportTable()
      -> __cxa_throw

内存调试错误：

Processes 0,2,4,6-9,15-17,20-21:
Memory error detected in Malloc_cuda_gx (cudamalloc.cu:35):
dmalloc bad admin structure list.

这一行是上面显示的 cudaMalloc 代码片段。还：

Processes 1,3,5,10-11,13-14,18-19,23:
Memory error detected in vfprintf from /lib64/libc.so.6:
dmalloc bad admin structure list.

此外，当在每个节点 3 个内核/gpus 而不是每个节点 4 个 gpus 上运行时，dmalloc 会检测到类似的内存错误，但是当不在调试模式下时，代码运行得非常好，每个节点 3 个 gpus（据我所知）。

score 1 · Accepted Answer

用 gcc 重新编译。（我使用 icc 来编译我的代码。）

当你这样做时，调试时出现异常，但继续过去，我得到真正的 CUDA 错误：

Process 9: gadget_cuda_gx.cu:116: ERROR in gadget_cuda_gx.cu:919: CUDA ERROR:   cudaThreadSynchronize(): unspecified launch failure
Process 20: cudamalloc.cu:38: ERROR all CUDA-capable devices are busy or unavailable, cudaMalloc failed to allocate 856792 bytes = 0.817101 Mb

Valgrind 在我的代码中没有发现内存损坏或泄漏（使用 gcc 或 icc 编译），但确实在 libcudart 中发现了一些泄漏。

更新：仍然没有修复。似乎与该线程的答案 #2 中报告的问题相同：cudaMemset failed on __device__ variable。运行时没有按应有的方式工作，似乎...

exception - cudaGetExportTable（CUDA 运行时库）中抛出异常'cudaError_enum'？

1 回答 1

Related

Reference