cuda - 为什么我在 cuda 中尝试此代码时会看到黑屏？

Question

我在“visual studio 2010”中使用 Win8 和 Nsight，并为我的显卡（9300m Gs）安装了“310.90-notebook-win8-win7-winvista-32bit-international-whql”。但是当我尝试下面的代码时，我看到黑屏！和错误：“显示驱动程序停止响应并已恢复”！我知道问题出在“cudaMemcpy”上，但我不知道为什么！？

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

#define N 8
__global__ void kernel(int *a)
{
int x = threadIdx.x + blockIdx.x * blockDim.x;
int step = x;
while(step<N){
    a[step] =  threadIdx.x;
    step += x;
}
}

int main()
{
int a[N],i=N,j=0;
for(;j<N;j++)
    a[j]=i--;

int *dev_a;
cudaMalloc( (void**)&dev_a, N * sizeof(int) );
cudaMemcpy( dev_a, a, N * sizeof(int), cudaMemcpyHostToDevice);

    kernel<<<2,2>>>(dev_a);

cudaError_t cudaStatus = cudaMemcpy(a, dev_a,N-1 * sizeof(int), cudaMemcpyDeviceToHost);
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaMemcpy failed!");
    //goto Error;
}

for(j=0;j<N;j++)printf("\n%d",a[j]);

int t;
scanf("%d",&t);
}

score 10 · Accepted Answer

在内核中，threadIdx.x= 0 和blockIdx.x= 0 的线程即第一个块的第一个线程将无限期地运行，导致内核崩溃。

当threadIdx.x= 0 和blockIdx.x= 0 时，内核代码将变为：

int x = 0;
int step = 0;
while(step<N)
{
    a[step] =  0;
    step += 0; //This will create infinite loop
}

另外（可能是错字），您的代码的以下行中存在逻辑错误：

cudaError_t cudaStatus = cudaMemcpy(a, dev_a,N-1 * sizeof(int), cudaMemcpyDeviceToHost);

考虑到 C 中的运算符优先级，表达式N-1 * sizeof(int)将计算为N-4（如果sizeof(int)为 4）。

cuda - 为什么我在 cuda 中尝试此代码时会看到黑屏？

1 回答 1

Related

Reference