opencl - 具有相同实时体素光线投射器实现的 opencl c99 和 c++ 之间的不同行为

Question

我正在与 opencl 合作开发体素光线投射引擎。我正在尝试做类似于Crassin的 Gigavoxels 的事情。在本文中，他们使用八叉树来存储体素数据。目前我只是试图在八叉树内下降，直到到达包含渲染数据的叶子。

我做了两个实现：一个在 GPU 上的 OpenCl 中，另一个在 CPU 上的 C++ 中。我遇到的问题是，在 GPU 上，算法正在经历错误数量的级别，直到到达八叉树内的叶子。CPU 版本给出了正确的结果。两个版本的算法相同，代码几乎相似。

你们知道可能是什么问题吗？可能是硬件问题、OpenCl 问题还是我做错了什么？我在三个不同的 nVidia GPU 上遇到了相同的结果。

这是 C++ 代码：

// Calculate actual ray stepping position
glm::vec4 pos = eyeRay_o + eyeRay_d * t;

uint offset = 0;
//check if root is leaf
uint leafFlag = GetLeafBit(octreeNodes[0]);
//get children address of root
uint childrenAddress = GetChildAddress(octreeNodes[0]);

while (iterations < 30) {  
    iterations++; 

    // Calculate subdivision offset
    offset = (uint)(pos.x * 2) + (uint)(pos.y * 2) * 2 + (uint)(pos.z * 2) * 4;
     
    if (leafFlag == 1) {
         //return some colour and exit the loop
         break;
    }
    else 
    {
         glm::uvec4 off = glm::uvec4(pos.x * 2, pos.y * 2, pos.z * 2, pos.w * 2);
         pos.x = 2 * pos.x - off.x;
         pos.y = 2 * pos.y - off.y;
         pos.z = 2 * pos.z - off.z;
         pos.w = 2 * pos.w - off.w;   
    }

    // Extract node data from the children
    finalAddress = childrenAddress + offset;    
    leafFlag = GetLeafBit(nodes[finalAddress]);
    childrenAddress = GetChildAddress(nodes[finalAddress]);
}

这是 OpenCL 代码：

// Calculate actual ray stepping position
float4 position = rayOrigin + rayDirection * t;
uint offset = 0;
//check if root is leaf
uint leafFlag = extractOctreeNodeLeaf(octreeNodes[0]);
//get children address of root
uint childrenAddress = extractOctreeNodeAddress(octreeNodes[0]);

//position will be in the [0, 1] interval
//size of octree is 1
while (iterations < 30) {  
    iterations++; 

    //calculate the index of the next child based on the position in the current subdivision
    offset = (uint)(position.x * 2) + (uint)(position.y * 2) * 2 + (uint)(position.z * 2) * 4;
     
    if (leafFlag == 1) {
        //return some colour and exit the loop
        break;
    }
    else 
    {
         //transform the position inside the parent 
         //to the position inside the child subdivision
         //size of child will be considered to be 1
         uint4 off; 
         off.x = floor(position.x * 2);
         off.y = floor(position.y * 2);
         off.z = floor(position.z * 2);
         off.w = floor(position.w * 2);
         position = 2 * position - off;  
    }
     
    // Extract node data from the children
    finalAddress = childrenAddress + offset; 
    leafFlag = extractOctreeNodeLeaf(octreeNodes[finalAddress]);
    //each node has an index to an array of 8 children - the index points to the first child
    childrenAddress = extractOctreeNodeAddress(octreeNodes[finalAddress]);
}

这是extractOctreeNodeAddress，根据要求：

这两个函数都只是做一些位操作：

OpenCL 版本：

inline char extractOctreeNodeLeaf(uint value) {
 value = value >> 1;
 return value & 1;
}

inline uint extractOctreeNodeAddress(uint value) {
 return value >> 2;
}

C++ 版本：

inline byte GetLeafBit(uint value)
{
 value = value >> 0x1;
 return value & 0x1;
}

inline uint GetChildAddress(uint value)
{
 return value >> 0x2;
}

嗨，我发现了一些有趣的东西。我尝试手动测试不同的变量，比较它们在单个精确像素和相机位置和方向上的 CPU 和 GPU 版本。在下面的代码中，如果我像现在这样运行程序，像素被打印为白色，并且值（> 5.5 与 CPU 实现相比完全错误），但是如果我注释最后一个 if 结构，并取消注释第一个，我得到的结果是红色的......这对我来说有点无法解释。有任何想法吗？

if ((x == 265) && (y == 209)) {
    /*float epsilon = 0.01f;
    float4 stuff = (float4)(0.7604471f, 0.9088342f, 0.9999924f, 0);
    if(fabs(pos.x - stuff.x) < epsilon)  
        temp = (float4)(1, 0, 0, 1);
    else
        temp = (float4)(1, 1, 1, 1);

    break;*/

    if(pos.x > 5.5)
    {
        temp = (float4)(1, 1, 1, 1);
        break;
    }
}

score 1 · Accepted Answer

主要问题是从 float4 到 uint4 的隐式转换。

逐个元素进行强制转换（仍然是隐式的）解决了这个问题。

opencl - 具有相同实时体素光线投射器实现的 opencl c99 和 c++ 之间的不同行为

1 回答 1

Related

Reference