c - gcd计算优化

Question

我有一个算法，它很大程度上基于使用有符号整数的最大公约数，所以我试图让它更快。请不要告诉我这种优化是不必要的，因为我已经将性能提高了 50%。我没有看到任何优化的潜力，但我可能错了。

在以下代码中，断言 b != 0

Integer gcd(Integer a, Integer b)
{
   if ( a == 0 )
   {
      return b;
   }
   if ( a == b )
   {
      return a;
   }
   const Integer mask_a = (a >> (sizeof(Integer) * 8 - 1));
   a = (a + mask_a) ^ mask_a; // a = |a|
   const Integer mask_b = (b >> (sizeof(Integer) * 8 - 1));
   b = (b + mask_b) ^ mask_b; // b = |b|
   if ( ~a & 1 )
   {
      if ( b & 1 )
      {
         // 2 divides a but not b
         return gcd(a >> 1, b);
      }
      else
      {
         // both a and b are divisible by 2, thus gcd(a, b) == 2 * gcd( a / 2, b / 2)
         return gcd(a >> 1, b >> 1) << 1;
      }
   }
   if ( ~b & 1 )
   {
      // 2 divides b, but not a
      return gcd(a, b >> 1);
   }
   if ( a > b )
   {
      // since both a and b are odd, a - b is divisible by 2
      // gcd(a, b) == gcd( a - b, b)
      return gcd((a - b) >> 1, b);
   }
   else
   {
      // since both a and b are odd, a - b is divisible by 2
      // gcd(a, b) == gcd( a - b, b)
      return gcd((b - a) >> 1, a);
   }
}

附带说明：整数是int,long或long long. 它是上面某处的typedef。

如您所见，在将 a 和 b 带到其各自的绝对值方面进行了优化，这在我的机器上运行良好（不一定在所有 afaik 上）。我有点不喜欢分支混乱。有什么方法可以改善吗？

score 3 · Accepted Answer

另一种变体，二进制 GCD，带有班次查找表，

typedef unsigned /* long long int */ UInteger;

Integer lookup(Integer a, Integer b) {
    static const int lut[] =
        { 8, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 5, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 6, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 5, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 7, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 5, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 6, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 5, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        };
    if (a == 0) return b;
    const Integer mask_a = a >> (sizeof(Integer) * 8 - 1);
    UInteger a1 = (a + mask_a) ^ mask_a;
    const Integer mask_b = b >> (sizeof(Integer) * 8 - 1);
    UInteger b1 = (b + mask_b) ^ mask_b;

    int sa = 0, sb = 0, shift;
    do {
        shift = lut[a1 & 0xFF];
        a1 >>= shift;
        sa += shift;
    }while(shift == 8);
    do {
        shift = lut[b1 & 0xFF];
        b1 >>= shift;
        sb += shift;
    }while(shift == 8);
    // sa holds the amount to shift the result by, the smaller of the trailing zeros counts
    if (sa > sb) sa = sb;
    // now a1 and b1 are both odd
    if (a1 < b1) {
        UInteger tmp = a1;
        a1 = b1;
        b1 = tmp;
    }
    while(b1 > 1) {
        do {
            a1 -= b1;
            if (a1 == 0){
                return (b1 << sa);
            }
            do {
                a1 >>= (shift = lut[a1 & 0xFF]);
            }while(shift == 8);
        }while(b1 < a1);
        do {
            b1 -= a1;
            if (b1 == 0){
                return (a1 << sa);
            }
            do {
                b1 >>= (shift = lut[b1 & 0xFF]);
            }while(shift == 8);
        }while(a1 < b1);
    }
    return (1 << sa);
}

在我的机器上，这比欧几里得算法快一点，欧几里得算法比递归二进制 GCD 或没有移位查找表的算法快很多。具有 16 元素查找表的版本比 Euclid 稍慢。

但是，如果您的输入非常小，正如您在评论中所说，为 GCD 本身计算查找表可能会更快，至少对于小的a和b（例如0 <= a,b <= 50），并且只有在输入为时才回退到计算大于查找表所允许的。

score 2 · Accepted Answer

我已经做了一些测试和简单的算法，如下所示：

int gcd(int a,int b)
{
    while(1)
    {
        int c = a%b;
        if(c==0)
          return abs(b);
        a = b;
        b = c;
    }
}

比您的代码快 5 倍左右。

试试这个：

Integer gcd( Integer a, Integer b )
{
    if( a == b )
        return a;

    if( a == 0 )
        return b;

    if( b == 0 )
        return a;

    while( b != 0 ) { 
        Integer t = b;
        b = a % b;
        a = t;
    }

    return a;
}

开头是@unkulunkulu 的代码 + 你的 IF。如果您的输入有很多琐碎的案例，它会更快。不幸的是，如果是这样的话，它不会比你的代码快多少，你也无能为力。

score 1 · Accepted Answer

您可以通过用循环替换递归来优化它。即使您的编译器将尾递归调用优化为跳转，您仍然会在a == 0递归调用中进行不必要的检查和绝对值计算。

要处理非尾递归gcd(a >> 1, b >> 1) << 1，您必须引入一个累加器变量，该变量从零开始，用于将 . 之前的结果左移return。

示例（未经测试）：

Integer gcd(Integer a, Integer b)
{
    const Integer mask_a = a >> (sizeof(Integer) * 8 - 1);
    a = (a + mask_a) ^ mask_a;
    const Integer mask_b = b >> (sizeof(Integer) * 8 - 1);
    b = (b + mask_b) ^ mask_b;

    int shift = 0;
    while (a != 0 && a != b) {
        if (~a & 1) {
            a >>= 1;
            if (!(b & 1)) {
                b >>= 1;
                shift++;
            }
        } else if (~b & 1) {
            b >>= 1;
        } else if (a > b) {
            a = (a - b) >> 1;
        } else {
            b = (b - a) >> 1; // the error was here and i have to write 6 chars about it, formerly it was a = (b - a) >> 1;
        }
    }
    return b << shift;
}

score 1 · Accepted Answer

如果您的编译器支持它，您可以使用内置函数，例如likely()or unlikely() for each if()，这将允许编译器进行更多优化。

这些函数是在 Linux 内核中定义的。使用 gcc 将其替换为__builtin_expect().

c - gcd计算优化

4 回答 4

Related

Reference