cuda - 动态并行 cudaDeviceSynchronize() 崩溃

Question

我有一个内核，它调用另一个空内核。然而，当调用内核调用 cudaDeviceSynchronize() 时，内核崩溃并且直接执行到主机。内存检查器不报告任何内存访问问题。有谁知道这种不文明行为的原因是什么？

只有当我从调试器（Visual Studio -> Nsight -> Start CUDA Debugging）运行代码时，才会发生崩溃。每次我运行代码时都不会发生崩溃 - 有时它会崩溃，有时它会正常完成。

以下是重现问题的完整代码：

#include <cuda_runtime.h>
#include <curand_kernel.h>
#include "device_launch_parameters.h"
#include <stdio.h>

#define CUDA_RUN(x_, err_) {cudaStatus = x_; if (cudaStatus != cudaSuccess) {fprintf(stderr, err_ "  %d - %s\n", cudaStatus, cudaGetErrorString(cudaStatus)); int k; scanf("%d", &k); goto Error;}}

struct computationalStorage {
    float rotMat;
};

__global__ void drawThetaFromDistribution() {}

__global__ void chainKernel() {
    computationalStorage* c = (computationalStorage*)malloc(sizeof(computationalStorage));
    if (!c) printf("malloc error\n");
    c->rotMat = 1.0f;

    int n = 1;
    while (n < 1000) {
        cudaError_t err;

        drawThetaFromDistribution<<<1, 1>>>();
        if ((err = cudaGetLastError()) != cudaSuccess)
            printf("drawThetaFromDistribution Sync kernel error: %s\n", cudaGetErrorString(err));
        printf("0");
        if ((err = cudaDeviceSynchronize()) != cudaSuccess)
          printf("drawThetaFromDistribution Async kernel error: %s\n", cudaGetErrorString(err));
        printf("1\n");
        ++n;
    }

    free(c);
}

int main() {
    cudaError_t cudaStatus;
    // Choose which GPU to run on, change this on a multi-GPU system.
    CUDA_RUN(cudaSetDevice(0), "cudaSetDevice failed!  Do you have a CUDA-capable GPU installed?");

    // Set to use on chip memory 16KB for shared, 48KB for L1
    CUDA_RUN(cudaDeviceSetCacheConfig ( cudaFuncCachePreferL1 ), "Can't set CUDA to use on chip memory for L1");
    // Set a large heap
    CUDA_RUN(cudaDeviceSetLimit(cudaLimitMallocHeapSize, 1024 * 10 * 192), "Can't set the Heap size");

    chainKernel<<<10, 192>>>();
    cudaStatus = cudaDeviceSynchronize();
    if (cudaStatus != cudaSuccess) {
        printf("Something was wrong! Error code: %d", cudaStatus);
    }

    CUDA_RUN(cudaDeviceReset(), "cudaDeviceReset failed!");

Error:
    int k;
    scanf("%d",&k);
    return 0;
}

如果一切顺利，我希望看到：

00000000000000000000000....0000000000000001
1
1
1
1
....

当一切正常时，这就是我得到的。但是当它崩溃时：

000000000000....0000000000000Something was wrong! Error code: 30

正如您所看到的，该语句err = cudaDeviceSynchronize();没有完成，并且执行直接转到主机，在那里它cudaDeviceSynchronize();失败并出现未知错误代码（30 = cudaErrorUnknown）。

系统：CUDA 5.5、NVidia-Titan（无头）、Windows 7x64、Win32 应用程序。更新：驱动显示器的附加 Nvidia 卡，Nsight 3.2.0.13289。

score 1 · Accepted Answer

最后一个事实可能是关键的。您没有提及您使用的是哪个版本的 nsight VSE，也没有提及您的确切机器配置（例如，机器中是否有其他 GPU，如果有，哪个正在驱动显示器？），但至少直到最近还不可能使用 nsight VSE 在单 GPU 模式下调试动态并行应用程序。

当前的特征矩阵还表明尚不支持单 GPU CDP 调试。

在您的情况下，一种可能的解决方法可能是添加另一个 GPU 来驱动显示器，并使 Titan 卡无头（即不连接任何显示器，也不将 Windows 桌面扩展到该 GPU）。

我在使用和不使用 cuda-memcheck 的情况下运行了您的应用程序，在我看来它没有任何问题。

cuda - 动态并行 cudaDeviceSynchronize() 崩溃

1 回答 1

Related

Reference