我正在与 opencl 合作开发体素光线投射引擎。我正在尝试做类似于Crassin的 Gigavoxels 的事情。在本文中,他们使用八叉树来存储体素数据。目前我只是试图在八叉树内下降,直到到达包含渲染数据的叶子。
我做了两个实现:一个在 GPU 上的 OpenCl 中,另一个在 CPU 上的 C++ 中。我遇到的问题是,在 GPU 上,算法正在经历错误数量的级别,直到到达八叉树内的叶子。CPU 版本给出了正确的结果。两个版本的算法相同,代码几乎相似。
你们知道可能是什么问题吗?可能是硬件问题、OpenCl 问题还是我做错了什么?我在三个不同的 nVidia GPU 上遇到了相同的结果。
这是 C++ 代码:
// Calculate actual ray stepping position
glm::vec4 pos = eyeRay_o + eyeRay_d * t;
uint offset = 0;
//check if root is leaf
uint leafFlag = GetLeafBit(octreeNodes[0]);
//get children address of root
uint childrenAddress = GetChildAddress(octreeNodes[0]);
while (iterations < 30) {
iterations++;
// Calculate subdivision offset
offset = (uint)(pos.x * 2) + (uint)(pos.y * 2) * 2 + (uint)(pos.z * 2) * 4;
if (leafFlag == 1) {
//return some colour and exit the loop
break;
}
else
{
glm::uvec4 off = glm::uvec4(pos.x * 2, pos.y * 2, pos.z * 2, pos.w * 2);
pos.x = 2 * pos.x - off.x;
pos.y = 2 * pos.y - off.y;
pos.z = 2 * pos.z - off.z;
pos.w = 2 * pos.w - off.w;
}
// Extract node data from the children
finalAddress = childrenAddress + offset;
leafFlag = GetLeafBit(nodes[finalAddress]);
childrenAddress = GetChildAddress(nodes[finalAddress]);
}
这是 OpenCL 代码:
// Calculate actual ray stepping position
float4 position = rayOrigin + rayDirection * t;
uint offset = 0;
//check if root is leaf
uint leafFlag = extractOctreeNodeLeaf(octreeNodes[0]);
//get children address of root
uint childrenAddress = extractOctreeNodeAddress(octreeNodes[0]);
//position will be in the [0, 1] interval
//size of octree is 1
while (iterations < 30) {
iterations++;
//calculate the index of the next child based on the position in the current subdivision
offset = (uint)(position.x * 2) + (uint)(position.y * 2) * 2 + (uint)(position.z * 2) * 4;
if (leafFlag == 1) {
//return some colour and exit the loop
break;
}
else
{
//transform the position inside the parent
//to the position inside the child subdivision
//size of child will be considered to be 1
uint4 off;
off.x = floor(position.x * 2);
off.y = floor(position.y * 2);
off.z = floor(position.z * 2);
off.w = floor(position.w * 2);
position = 2 * position - off;
}
// Extract node data from the children
finalAddress = childrenAddress + offset;
leafFlag = extractOctreeNodeLeaf(octreeNodes[finalAddress]);
//each node has an index to an array of 8 children - the index points to the first child
childrenAddress = extractOctreeNodeAddress(octreeNodes[finalAddress]);
}
这是extractOctreeNodeAddress,根据要求:
这两个函数都只是做一些位操作:
OpenCL 版本:
inline char extractOctreeNodeLeaf(uint value) {
value = value >> 1;
return value & 1;
}
inline uint extractOctreeNodeAddress(uint value) {
return value >> 2;
}
C++ 版本:
inline byte GetLeafBit(uint value)
{
value = value >> 0x1;
return value & 0x1;
}
inline uint GetChildAddress(uint value)
{
return value >> 0x2;
}
嗨,我发现了一些有趣的东西。我尝试手动测试不同的变量,比较它们在单个精确像素和相机位置和方向上的 CPU 和 GPU 版本。在下面的代码中,如果我像现在这样运行程序,像素被打印为白色,并且值(> 5.5 与 CPU 实现相比完全错误),但是如果我注释最后一个 if 结构,并取消注释第一个,我得到的结果是红色的......这对我来说有点无法解释。有任何想法吗?
if ((x == 265) && (y == 209)) {
/*float epsilon = 0.01f;
float4 stuff = (float4)(0.7604471f, 0.9088342f, 0.9999924f, 0);
if(fabs(pos.x - stuff.x) < epsilon)
temp = (float4)(1, 0, 0, 1);
else
temp = (float4)(1, 1, 1, 1);
break;*/
if(pos.x > 5.5)
{
temp = (float4)(1, 1, 1, 1);
break;
}
}