c++ - MATLAB CUDA 内核对象 - 使用收集时出错？

Question

我有以下CUDAKernel对象：

工作区

我调用使用：

kernel1 = parallel.gpu.CUDAKernel('kcc2.ptx', 'kcc2.cu');
kernel1.ThreadBlockSize = 256;
kernel1.GridSize = 4;

gpuTM = gpuArray(single(TM));
gpuLTM = gpuArray(single(LTM));
gpuLTMP = gpuArray(int32(LTMP));

rng('shuffle');
randz = abs(randi(2^53 -1, [1, r_max]));
GPUrands = gpuArray(double(randz));

[x,y] = gather(feval(kernel1, gpuLTM, gpuLTMP, F_M, Force, GPUrands, ...
    (r_max), single(Lamda), single(Fixed_dt), single(r), single(q), ...
    single(gama_B), single(gama_M), single(mu_B), single(mu_M), ...
    single(KB_p_ref), single(KB_m_ref), single(f_ref), single(g_ref), ...
    single(Kca_p_ref), single(Kca_m_ref)));

如您在上面看到的，我有 2 个左手参数，但在 MATLAB 中出现错误：

使用 gpuArray/gather 时出错：输出参数过多。

我不明白。我所有的参数都在 CUDA 内核和 MATLAB 中排列。如您所见，内核函数具有以下 C++ 原型：

__global__ void myKern(const float *transMatrix, const int *pointerMatrix, 
    float *masterForces, float *Force, const double *rands, const int r_max, 
    const float lamda, const float dt, const float r, const float q, 
    const float gama_B, const float gama_M, const float mu_B, const float mu_M, 
    const float KB_p_ref, const float KB_m_ref, const float f_ref, 
    const float g_ref, const float Kca_p_ref, const float Kca_m_ref)

它应该只返回masterForcesand Force([x,y]在 MATLAB 中)，因为它们是唯一的非常量指针。

可能是什么问题呢？

score 3 · Accepted Answer

您不能gather直接应用于多个输出变量，您必须在单独的行中执行此操作（这是基本的 MATLAB 语法）：

[x,y] = feval(kernel1, ...);
x = gather(x);
y = gather(y);

评估 CUDA 内核的输出是两个类型的变量gpuArray（存储在 GPU 上的数据）。gather然后，您可以使用应用于每个变量的方法将数据传输到 CPU 内存。

c++ - MATLAB CUDA 内核对象 - 使用收集时出错？

1 回答 1

Related

Reference