c++ - 在 CUDA 中实现一个简单的 Z-Buffer

Question

我有一个 3D 点云，我将像素投影到图像平面。由于某些 3D 点被映射到同一个像素，因此我只希望我的相机具有最低 Z 值的像素。我使用 Z-Buffer（一个浮点数组）来跟踪我的深度值。这是一些伪代码：

// Initialize z-Buffer with max depth (99999.9f)
// Go through every point of point cloud...
// Project Point (x,y,z) to image plane (u,v) 
int newIndex = v*imgWidth+u;
float oldDepth = zbuffer[newIndex];

if (z < oldDepth){
  zbuffer[newIndex] = z; // put z value in buffer
  outputImg[newIndex] = pointColor[i]; // put pixel in resulting image
}

我有一个完美运行的单核 CPU 版本。

cuda 版本看起来不错并且速度极快，但只有 z 测试起作用的区域非常“条纹”，这意味着一些背景点正在覆盖前景像素，我认为。此外，当我查看彩色图像时，我会看到随机的颜色条纹，其中包含图像中不存在的颜色。

CUDA 版本看起来更像这样：

//Initialize, kernel, project, new coordinates...

const float oldDepth = outputDepth[v * outputMaxWidth + u];

if (z < oldDepth){
  outputDepth[v * outputMaxWidth + u] = z;
  const int inputColorIndex = yIndex * inputImageStep + 3*xIndex;
  const int outputColorIndex = yIndex * outputImageStep + 3*xIndex;
  outputImage[outputColorIndex] = inputImage[inputColorIndex]; //B
  outputImage[outputColorIndex + 1] = inputImage[inputColorIndex + 1]; //G
  outputImage[outputColorIndex + 2] = inputImage[inputColorIndex + 2]; //R
}

我认为这里的并发性是一个问题。一个线程可能会在 Z-Buffer 中写入该像素最近的 z 值，但同时另一个线程读取旧值并覆盖正确的值。

如何在 CUDA 中防止这种情况发生？

Edit1：将块大小从 (16,16) 减小到 (1,1) 将导致更少的条纹图案，但它看起来像 1 个像素孔。

Edit2：这是一个最小的例子：

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

cudaError_t insertToZBuffer(int *z, const int *a, unsigned int size);

__global__ void zbufferKernel(int *z, const int *a)
{
    int i = threadIdx.x;
    if (a[i] < z[0]){
        z[0] = a[i]; //  all mapped to pixel index 0        
    }    
}

int main(){
    for (int i = 0; i < 20; ++i){
        const int arraySize = 5;
        const int a[arraySize] = { 1, 7, 3, 40, 5 }; // some depth values which get mapped all to index 0
        int z[arraySize] = { 999 }; // large depth value

        insertToZBuffer(z, a, arraySize);

        printf("{%d,%d,%d,%d,%d}\n", z[0], z[1], z[2], z[3], z[4]);
        cudaDeviceReset();

    }   
    return 0;
}

cudaError_t insertToZBuffer(int *z, const int *a, unsigned int size){
    int *dev_a = 0;
    int *dev_z = 0;
    cudaError_t cudaStatus;
    cudaStatus = cudaSetDevice(0);
    cudaStatus = cudaMalloc((void**)&dev_z, size * sizeof(int));
    cudaStatus = cudaMalloc((void**)&dev_a, size * sizeof(int));
    cudaStatus = cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);
    cudaStatus = cudaMemcpy(dev_z, z, size * sizeof(int), cudaMemcpyHostToDevice);
    zbufferKernel<<<1, size >>>(dev_z, dev_a);
    cudaStatus = cudaGetLastError();
    cudaStatus = cudaDeviceSynchronize();   
    cudaStatus = cudaMemcpy(z, dev_z, size * sizeof(int), cudaMemcpyDeviceToHost);

    cudaFree(dev_z);
    cudaFree(dev_a);

    return cudaStatus;
}

索引 0 处 z 的值应该是 1，因为它是最小值，但它是 5，这是 a 的最后一个值。

score -1 · Accepted Answer

感谢评论，这是我解决它的方法：

如果 z 值较小，我使用 atomicCAS（将浮点数转换为整数）写入我的 z 缓冲区。当当前线程有较大的 z 值时，我简单地返回。最后，我同步了所有剩余的线程（__syncthreads()），这些线程已经写入缓冲区并检查它们的 z 值是否是最后一个。如果确实如此，我将点颜色写入该位置的像素值。

编辑：我应该只使用 atomicMin ...

c++ - 在 CUDA 中实现一个简单的 Z-Buffer

1 回答 1

Related

Reference