c - 我需要一个快速的 96 位 64 位特定除法算法，用于定点数学库

Question

我目前正在编写一个快速的 32.32 定点数学库。我成功地使加法、减法和乘法正常工作，但我很困于除法。

给那些不记得的人一点提醒：32.32 定点数是一个具有 32 位整数部分和 32 位小数部分的数字。

我想出的最佳算法需要 96 位整数除法，这是编译器通常没有内置的。

无论如何，这里是：

G = 2^32

notation: x is the 64-bit fixed-point number, x1 is its low nibble and x2 is its high

G*(a/b) = ((a1 + a2*G) / (b1 + b2*G))*G      // Decompose this

G*(a/b) = (a1*G) / (b1*G + b2) + (a2*G*G) / (b1*G + b2)

如您所见，(a2*G*G)保证大于常规的 64 位整数。如果我的编译器实际上支持 uint128_t，我只需执行以下操作：

((uint128_t)x << 32) / y)

好吧，他们不是，我需要一个解决方案。谢谢您的帮助。

score 7 · Accepted Answer

您可以将较大的划分分解为多个块，这些块用较少的位进行划分。正如另一张海报已经提到的那样，该算法可以在 Knuth 的 TAOCP 中找到。

但是，没必要买书！

黑客喜悦网站上有一段代码用 C 语言实现了该算法。它被编写为仅使用 32 位算术进行 64 位无符号除法，因此您不能直接剪切“粘贴”代码。要从 64 位到 128 位，您必须将所有类型、掩码和常量扩大 2，例如，short 变成 int，a0xffff变成0xffffffffllect。

在这个简单的更改之后，您应该能够进行 128 位除法。

该代码镜像在GitHub 上，但最初发布在Hackersdelight.org上（原始链接不再可访问）。

由于您的最大值只需要 96 位，因此 64 位除法之一将始终返回零，因此您甚至可以稍微简化代码。

哦 - 在我忘记这一点之前：该代码仅适用于无符号值。要将有符号除法转换为无符号除法，您可以执行以下操作（伪代码样式）：

fixpoint Divide (fixpoint a, fixpoint b)
{
    // check if the integers are of different sign:
    fixpoint sign_difference = a ^ b; 
    
    // do unsigned division:
    fixpoint x = unsigned_divide (abs(a), abs(b));
    
    // if the signs have been different: negate the result.
    if (sign_difference < 0)
    {
       x = -x;
    }
    
    return x;
}

该网站本身也值得一试：http ://www.hackersdelight.org/

顺便说一句 - 你正在做的很好的任务.. 你介意告诉我们你需要什么定点库吗？

顺便说一句 - 用于除法的普通移位和减法算法也可以工作。

如果您以 x86 为目标，则可以使用 MMX 或 SSE 内在函数来实现它。该算法仅依赖于原始操作，因此它的执行速度也非常快。

score 1 · Accepted Answer

更好的自我调整答案：
原谅答案的 C# 主义，但以下应该适用于所有情况。可能有一种解决方案可以更快地找到正确的转变，但我必须比现在更深入地思考。不过，这应该是相当有效的：

int upshift = 32;
ulong mask = 0xFFFFFFFF00000000;
ulong mod = x % y;
while ((mod & mask) != 0)
{
     // Current upshift of the remainder would overflow... so adjust
     y >>= 1;
     mask <<= 1;
     upshift--;

     mod = x % y;
}
ulong div = ((x / y) << upshift) + (mod << upshift) / y;

简单但不安全的答案：如果余数在高 32 位中设置了任何位，则
此计算可能导致x % y余数升档溢出，从而导致错误答案。

((x / y) << 32) + ((x % y) << 32) / y

第一部分使用整数除法并为您提供答案的高位（将它们向上移动）。

第二部分从高位除法的剩余部分（无法进一步除法的位）计算低位，向上移动然后除法。

score 0 · Accepted Answer

Quick -n- dirty.

Do the A/B divide with double precision floating point. This gives you C~=A/B. It's only approximate because of floating point precision and 53 bits of mantissa.

Round off C to a representable number in your fixed point system.

Now compute (again with your fixed point) D=A-C*B. This should have significantly lower magnitude than A.

Repeat , now computing D/B with floating point. Again, round the answer to an integer. Add each division result together as you go. You can stop when your remainder is so small that your floating point divide returns 0 after rounding.

You're still not done. Now you're very close to the answer, but the divisions weren't exact. To finalize, you'll have to do a binary search. Using the (very good) starting estimate, see if increasing it improves the error.. you basically want to bracket the proper answer and keep dividing the range in half with new tests.

Yes, you could do Newton iteration here, but binary search will likely be easier since you need only simple multiplies and adds using your existing 32.32 precision toolkit.

This is not the most efficient method, but it's by far the easiest to code.

score 0 · Accepted Answer

我喜欢 Nils 的回答，这可能是最好的。这只是一个长除法，就像我们在小学学过的一样，只是数字是以 2^32 为底，而不是以 10 为底。

但是，您也可以考虑使用牛顿近似法进行除法：

  x := x (N + N - N * D * x)

其中 N 是分子，D 是恶魔。

这仅使用您已经拥有的乘法和加法，并且它很快收敛到大约 1 ULP 的精度。另一方面，在所有情况下，您都无法获得准确的 0.5-ULP 答案。

无论如何，棘手的一点是检测和处理溢出。

c - 我需要一个快速的 96 位 64 位特定除法算法，用于定点数学库

4 回答 4

Related

Reference