cuda - 3D 阵列的分段故障

Question

我正在尝试在 CUDA (200x200x100) 中使用 3D 数组。

当我将 z 维度 (model_num) 从 4 更改为 5 时，我得到了分段错误。为什么，我该如何解决？

const int nrcells = 200;
const int nphicells = 200;
const int model_num = 5; //So far, 4 is the maximum model_num that works. At 5 and after, there is a segmentation fault

    __global__ void kernel(float* mgridb) 
{
    const unsigned long long int  i = (blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x;

    if(tx >= 0 && tx < nphicells && ty >=0 && ty < nrcells && tz >= 0  && tz < model_num){
        //Do stuff with mgridb[i]
    }
}

int main (void)
{

    unsigned long long int size_matrices = nphicells*nrcells*model_num; 
    unsigned long long int mem_size_matrices = sizeof(float) * size_matrices;

    float *h_mgridb = (float *)malloc(mem_size_matrices);
    float mgridb[nphicells][nrcells][model_num];

    for(int k = 0; k < model_num; k++){
        for(int j = 0; j < nrcells; j++){
            for(int i = 0; i < nphicells; i++){
                mgridb[i][j][k] = 0;
            }
        }
    }
    float *d_mgridb;

    cudaMalloc( (void**)&d_mgridb, mem_size_matrices );
    cudaMemcpy(d_mgridb, h_mgridb, mem_size_matrices, cudaMemcpyHostToDevice);

    int threads = nphicells;
    uint3 blocks = make_uint3(nrcells,model_num,1);
    kernel<<<blocks,threads>>>(d_mgridb);
    cudaMemcpy( h_mgridb, d_mgridb, mem_size_matrices, cudaMemcpyDeviceToHost);
    cudaFree(d_mgridb);
    return 0;
}

score 3 · Accepted Answer

这是存储在堆栈中的：

float mgridb[nphicells][nrcells][model_num];

您的堆栈空间是有限的。当您超过可以存储在堆栈上的数量时，您会在分配点或尝试访问它时立即收到 seg fault 。

改为使用malloc。这分配了具有更高限制的堆存储。

以上都与CUDA无关。

您可能还需要调整访问数组的方式，但使用指针索引处理扁平数组并不难。

您的代码实际上看起来很奇怪，因为您正在h_mgridb使用创建一个适当大小的数组malloc，然后将该数组复制到设备（到d_mgridb）。目前尚不清楚mgridb您的代码中有什么用途。 h_mgridb并且mgridb不一样。

cuda - 3D 阵列的分段故障

1 回答 1

Related

Reference