cuda - CUDA：如何存储持久数据？

Question

我想将背景图像数据存储在 CUDA 中的设备上。稍后，当我从视频源读取新场景时，我想将新场景作为前景图像发送到 GPU 并从背景图像中减去它。我不想为每个场景都将背景图像重新发送到 GPU。我怎样才能做到这一点？

score 3 · Accepted Answer

将背景图像存储在设备内存阵列中（即在 GPU 上）。然后，当您读取前景图像时cudaMemcpy，将其复制到另一个设备内存阵列。然后启动一个内核，将两个设备内存数组作为参数并执行图像减法。应该很简单。

假设您使用默认上下文创建并且这一切都在同一个 CPU 线程中运行，您不必担心执行任何特定的操作来保持您的 CUDA 上下文“完整”，正如 Bart 评论的那样。但是，如果您进行任何 CPU 多线程处理，则需要进行一些上下文管理。

score 1 · Accepted Answer

这是一个简单的例子..

int main(int argc, char **argv) {
    uint *hostBackground, *hostForeground; //new uint[]..
    uint *background, *foreground;

首先初始化你的后台和前台数据..

    cudaMalloc(background, ..);
    cudaMalloc(foreground, ..);

然后加载后台数据

    cudaMemCpy(background, hostBackground, ..); //copy to device..

然后读取前台数据

    while (applicationRuns) {
        readImage(hostForeground); //read image..
        cudaMemcpy(foreground, hostForeground, ..); //copy to device

        //operate on foreground..
        substruct_kernel<<<threads, blocks>>>(foreground, background, width, height);

        cudaMemcpy(hostForeground, foreground, ..); //copy to host

        //use hostForeground
    }

释放他们

    cudaFree(foreground);
    cudaFree(background);
}

这是一个简单的子结构内核..

__global__ void substruct_kernel(uint *foreground, uint *backgroung, int width, int height)
{
    int idx = threadIdx.x + threadDim.x * blockIdx.x;
    int idy = threadIdx.y + threadDim.y * blockIdx.y;

    if (idx < width && idy < height)
       foreground[idx + idy * width] -= background[idx + idy * width]; //clamp may be required..
}

我确实建议使用库来进行这种简单的操作。Blas 库或 Thrust 库可能是选项。

cuda - CUDA：如何存储持久数据？

2 回答 2

Related

Reference