0

我无法弄清楚如何从 GPU 中检索 3D 数组。我想在主机代码中为 3d 数组分配内存,调用内核,数组将在其中填充,然后将主机代码中的 3D 数组检索到 mexFunction(主机代码)中的返回变量。

我已经做了几次尝试,这是我最新的代码。结果都是'0',它们应该是'7'。谁能告诉我哪里出错了?它可能与 3D 参数有关,我认为我不完全理解那部分。

模拟3DArrays.cpp

/* Device code */
__global__ void simulate3DArrays(cudaPitchedPtr devPitchedPtr, 
                             int width, 
                             int height, 
                             int depth) 
{
int threadId;
threadId = (blockIdx.x * blockDim.x) + threadIdx.x;

size_t pitch = devPitchedPtr.pitch; 

for (int widthIndex = 0; widthIndex < width; widthIndex++) {
    for (int heightIndex = 0; heightIndex < height; heightIndex++) {

        *((double*)(((char*)devPitchedPtr.ptr + threadId * pitch * height) + heightIndex * pitch) + widthIndex) = 7.0;

    }
}    
}

mexFunction.cu

/* Host code */
#include <stdio.h>
#include "mex.h"

/* Kernel function */
#include "simulate3DArrays.cpp"

/* Define some constants. */
#define width  5
#define height 9
#define depth  6

void displayMemoryAvailability(mxArray **MatlabMemory);

void mexFunction(int        nlhs,
             mxArray    *plhs[],
             int        nrhs,
             mxArray    *prhs[])
{

double *output;
mwSize ndim3 = 3;
mwSize dims3[] = {height, width, depth};

plhs[0] = mxCreateNumericArray(ndim3, dims3, mxDOUBLE_CLASS, mxREAL);
output = mxGetPr(plhs[0]);

cudaExtent extent = make_cudaExtent(width * sizeof(double), height, depth);
cudaPitchedPtr devicePointer;
cudaMalloc3D(&devicePointer, extent);


simulate3DArrays<<<1,depth>>>(devicePointer, width, height, depth);

cudaMemcpy3DParms deviceOuput = { 0 };
deviceOuput.srcPtr.ptr = devicePointer.ptr;
deviceOuput.srcPtr.pitch = devicePointer.pitch;
deviceOuput.srcPtr.xsize = width;
deviceOuput.srcPtr.ysize = height;

deviceOuput.dstPtr.ptr = output;
deviceOuput.dstPtr.pitch = devicePointer.pitch;
deviceOuput.dstPtr.xsize = width;
deviceOuput.dstPtr.ysize = height;

deviceOuput.kind = cudaMemcpyDeviceToHost;
/* copy 3d array back to 'ouput' */
cudaMemcpy3D(&deviceOuput);


return;
} /* End Mexfunction */
4

1 回答 1

1

基本问题似乎是您正在指示cudaMemcpy3D复制零字节,因为您没有包含定义传输到 API 的大小的非零范围。

您的转移可能很简单:

cudaMemcpy3DParms deviceOuput = { 0 };
deviceOuput.srcPtr = devicePointer;
deviceOuput.dstPtr.ptr = output;
deviceOuput.extent = extent;

cudaMemcpy3D(&deviceOuput);

我无法评论您使用的 MEX 接口是否正确,但内核看起来表面上是正确的,而且我没有看到任何其他明显错误的地方,而无需使用编译器并尝试使用 Matlab 运行您的代码,而我不能.

于 2012-08-09T12:12:10.987 回答