visual-studio-2010 - 两个连续的“cudaMallocPitch”使代码失败

Question

我写了一个简单的CUDA代码如下：

//Allocate the first 2d array "deviceArray2DInput"
if(cudaMallocPitch((Float32**) &deviceArray2DInput, &devicePitch, sizeof(Float32)*deviceColNumber,deviceRowNumber) == cudaErrorMemoryAllocation){
    return -1;
}

//Allocate the second 2d array "deviceArray2DOutput". It was suppose to hold the output of some process.
if(cudaMallocPitch((Float32**) &deviceArray2DOutput, &devicePitch,sizeof(Float32)*deviceRowNumber,deviceColNumber) == cudaErrorMemoryAllocation){
    return -1;
}

//Copy data from "hostArrayR" to "deviceArray2DInput" (#1)
cudaMemcpy2D(deviceArray2DInput,devicePitch,hostArrayR,sizeof(Float32)*colNumber,sizeof(Float32)*deviceColNumber,deviceRowNumber,cudaMemcpyHostToDevice);

//Clean the top 10000 elements in "hostArrayR" for verification. 
for(int i = 0; i < 10000; ++i){
    hostArrayR[i] = 0;
}

//Copy data back from "deviceArray2DInput" to "hostArrayR"(#2)
cudaMemcpy2D(hostArrayR,sizeof(Float32)*colNumber,deviceArray2DInput,devicePitch,sizeof(Float32)*deviceColNumber,deviceRowNumber,cudaMemcpyDeviceToHost);

我注释掉了第二个分配块，代码运行良好。它将数据从主机数组“hostArrayR”复制到设备数组“deviceArray2DInput”并复制回来。但是，如果两个分配块都存在，则复制回的“hostArrayR”为空（没有从设备复制回数据）。

我确信数据在第 (#1) 行的“hostArrayR”中，但第 (#2) 行没有数据。我清理了前 10000 个元素（远小于数组的大小）以验证数据没有返回。

我在 Visual Studio 2010 上使用 Nvidia Nsight 2.2。数组大小为 1024x768，我使用的是浮动 32 位数据。我的显卡是 GTX570。似乎没有内存分配错误（或者代码会在复制之前返回）。

我没有尝试“cudaMalloc()”，因为我更喜欢使用“cudaMallocPitch()”进行内存对齐。

score 3 · Accepted Answer

您应该根据 cudaSuccess 检查 API 调用，而不是一个特定的错误。
您应该检查 memcpys 返回的错误值。
你在devicePitch第二次cudaMallocPitch()调用时覆盖了，数组有不同的形状，因此可能有不同的音高。

visual-studio-2010 - 两个连续的“cudaMallocPitch”使代码失败

1 回答 1

Related

Reference