c# - log2(int) 和 log2(float) 的最快实现

Question

问题是

基本 2log 是否还有其他（和/或更快）实现？

应用

log2(int) 和 log2(float) 操作在许多不同的上下文中都非常有用。仅举几例：压缩算法、3D 引擎和机器学习。在几乎所有这些上下文中，它们都用于被调用数十亿次的低级代码中……尤其是 log2(int) 操作非常有用。

因为我发现自己一直在使用 log2，所以我不想在这里给出我正在开发的特定应用程序。相同的是，这是一个真正的性能消耗者（如各种应用程序的性能测试所示）。对我来说，尽可能快地做到这一点是关键。

底部添加了测试所有实现的完整源代码，您可以自己查看。

当然……至少运行 3 次测试，并确保计数器足够大，可以击中几秒钟。我还执行“添加”操作以确保 JIT'ter 不会神奇地删除整个循环。那么让我们开始真正的工作吧。

简单的实现

C# 中 2log 的简单实现是：

(int)(Math.Log(x) / Math.Log(2))

这个实现很简单，但也很慢。它需要 2 个日志操作，这些操作本身已经很慢了。当然，我们可以通过1.0/Math.Log(2)设置一个常数来优化它。

请注意，我们需要稍微修改这个常数以获得正确的结果（作为浮点错误的结果）或添加一个小数字以获得正确的结果。我选择了后者，但这并不重要——最终结果在所有情况下都很慢。

查表

一个更快的解决方案是使用查找表。虽然您可以使用 2 的任意幂的查找表，但我通常使用 256 或 64K 条目的表大小。

首先我们创建查找表：

lookup = new int[256];
for (int i = 1; i < 256; ++i)
{
    lookup[i] = (int)(Math.Log(i) / Math.Log(2));
}

接下来，我们实现2log如下：

private static int LogLookup(int i)
{
    if (i >= 0x1000000) { return lookup[i >> 24] + 24; }
    else if (i >= 0x10000) { return lookup[i >> 16] + 16; }
    else if (i >= 0x100) { return lookup[i >> 8] + 8; }
    else { return lookup[i]; }
}

如您所见，表查找是一种非常快得多的实现——但作为一个缺点，它不能用于计算log2(float).

分支移除

众所周知，处理器不太擅长分支，所以我认为可以通过删除分支来改进表查找。我引入了第二个表，其中包含值和移位位以在表中找到条目，而不是一堆 if ：

nobranch = new int[16] { 0, 0, 8, 8, 16, 16, 16, 16, 24, 24, 24, 24, 24, 24, 24, 24 };

private static int LogDoubleLookup(int i)
{
    int n = (i | (i >> 4));
    n = (n | (n >> 2));
    n = (n | (n >> 1));
    n = ((n & 0x1000000) >> 21) | ((n & 0x10000) >> 14) | ((n & 0x100) >> 7) | (n & 1);
    int br = nobranch[n];
    return lookup[i >> br] + br;
}

如果你运行这个测试，你会发现它实际上比 if-then-else 解决方案要慢。

然后是英特尔 80386

英特尔多年前就知道这是一项重要的操作，因此他们在其处理器中实施了位扫描转发 (BSF)。其他处理器也有类似的指令。这是迄今为止我所知道的最快的 2log 方法 - 但不幸的是，我现在知道如何使用 C# 中的这些好功能......我不喜欢有一个不再运行的实现的想法当新的平板电脑或手机上市时——我不知道有任何跨平台解决方案可以让我直接使用这个功能。

其他实现

正如 l4V 指出的（谢谢！）还有其他几个实现，特别是：

微不足道的循环。我省略了这一点，因为这很简单，这并不是真的很快。实施于TestTrivial.
可以使用的 64 位 IEEE / int union's。实施于TestFloat
DeBruijn 查找表。实施于TestDeBruijn
二进制搜索。实施于TestBinary

除了我喜欢这个名字之外，DeBruijn 查找表与普通查找表一样快，使其成为这里最快的算法之一……我尝试过的所有其他算法都慢得多。

完整的测试代码

public class Log2Test
{
    public static void TestNaive()
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += (int)(Math.Log(i) / Math.Log(2.0));
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - naive implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    public static int LogTrivialLoop(int v)
    {
        int r = 0;
        while ((v >>= 1) > 0) // unroll for more speed...
        {
            r++;
        }
        return r;
    }

    public static void TestTrivialLoop()
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogTrivialLoop(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - loop implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    public static int LogFloat(int v)
    {
        Helper h = new Helper() { U1 = v, U2 = 0x43300000 };
        h.D -= 4503599627370496.0;
        return (h.U2 >> 20) - 0x3FF;
    }

    public static void TestFloat()
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogFloat(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - IEEE float implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    [StructLayout(LayoutKind.Explicit)]
    private struct Helper
    {
        [FieldOffset(0)]
        public int U1;
        [FieldOffset(4)]
        public int U2;
        [FieldOffset(0)]
        public double D;
    }

    public static void TestConstant()
    {
        double c = 1.0 / Math.Log(2.0);
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += (int)(0.00000000001 + Math.Log(i) * c);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - naive 2 implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int LogLookup(int i)
    {
        if (i >= 0x1000000) { return lookup[i >> 24] + 24; }
        else if (i >= 0x10000) { return lookup[i >> 16] + 16; }
        else if (i >= 0x100) { return lookup[i >> 8] + 8; }
        else { return lookup[i]; }
    }

    public static void TestLookup()
    {
        lookup = new int[256];
        for (int i = 1; i < 256; ++i)
        {
            lookup[i] = (int)(Math.Log(i) / Math.Log(2));
        }
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogLookup(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - table lookup implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int LogDoubleLookup(int i)
    {
        int n = (i | (i >> 4));
        n = (n | (n >> 2));
        n = (n | (n >> 1));
        n = ((n & 0x1000000) >> 21) | ((n & 0x10000) >> 14) | ((n & 0x100) >> 7) | (n & 1);
        int br = nobranch[n];
        return lookup[i >> br] + br;
    }

    public static void TestDoubleLookup()
    {
        // Lookup table was already constructed earlier
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogDoubleLookup(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - double table lookup implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int LogBinary(int v)
    {
        /* This is the worst implementation ever... - apparently C# is a slow-branching language

        int[] b = { 0x2, 0xC, 0xF0, 0xFF00, 0x7FFF0000 };
        int[] S = { 1, 2, 4, 8, 16 };

        int r = 0; // result of log2(v) will go here
        for (int i = 4; i >= 0; i--) // unroll for speed...
        {
            if ((v & b[i]) != 0)
            {
                v >>= S[i];
                r |= S[i];
            }
        }
        return r;

         */

        int r = (((v > 0xFFFF)) ? 0x10 : 0); 
        v >>= r;
        int shift = ((v > 0xFF) ? 0x8 : 0); 
        v >>= shift; 
        r |= shift;
        shift = ((v > 0xF) ? 0x4 : 0); 
        v >>= shift;
        r |= shift;
        shift = ((v > 0x3) ? 0x2 : 0); 
        v >>= shift;
        r |= shift;
        r |= (v >> 1);
        return r;
    }

    public static void TestBinary()
    {
        // Lookup table was already constructed earlier
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogBinary(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - binary search implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static readonly int[] MultiplyDeBruijnBitPosition = new int[32]
    {
        0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
        8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
    };

    private static int LogDeBruijn(int v)
    {
        v |= v >> 1; // first round down to one less than a power of 2 
        v |= v >> 2;
        v |= v >> 4;
        v |= v >> 8;
        v |= v >> 16;

        return MultiplyDeBruijnBitPosition[(uint)(v * 0x07C4ACDDU) >> 27];
    }

    public static void TestDeBruijn()
    {
        // Lookup table was already constructed earlier
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogDeBruijn(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - de Bruijn implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int[] lookup;
    private static readonly int[] nobranch = new int[16] { 0, 0, 8, 8, 16, 16, 16, 16, 24, 24, 24, 24, 24, 24, 24, 24 };

    static void Main(string[] args)
    {
        TestConstant();
        TestNaive();
        TestDeBruijn();
        TestBinary();
        TestFloat();
        TestTrivialLoop();
        TestLookup();
        TestDoubleLookup();
        Console.ReadLine();
    }
}

score 5 · Accepted Answer

采用已经提到的二进制解决方案并删除了分支。做了一些测试，结果证明它比 DeBruijn 快 1.3 倍。

public static int Log2(int v)
{
    int r = 0xFFFF - v >> 31 & 0x10;
    v >>= r;
    int shift = 0xFF - v >> 31 & 0x8;
    v >>= shift; 
    r |= shift;
    shift = 0xF - v >> 31 & 0x4;
    v >>= shift;
    r |= shift;
    shift = 0x3 - v >> 31 & 0x2;
    v >>= shift;
    r |= shift;
    r |= (v >> 1);
    return r;
}

score 3 · Accepted Answer

这里有一些整数算法。

在 C# 中：

public static uint FloorLog2(uint x)
{
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);

    return (uint)(NumBitsSet(x) - 1);
}

public static uint CeilingLog2(uint x)
{
    int y = (int)(x & (x - 1));

    y |= -y;
    y >>= (WORDBITS - 1);
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);

    return (uint)(NumBitsSet(x) - 1 - y);
}

public static int NumBitsSet(uint x)
{
    x -= ((x >> 1) & 0x55555555);
    x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
    x = (((x >> 4) + x) & 0x0f0f0f0f);
    x += (x >> 8);
    x += (x >> 16);

    return (int)(x & 0x0000003f);
}

private const int WORDBITS = 32;

您应该查看我为上下文链接的站点上的原始代码，特别是 Log2(0) 发生的情况。

score 2 · Accepted Answer

更多算法请看这里http://www.asmcommunity.net/forums/topic/?id=15010

还在 C++ 中做了一些测试，我的 BSR 实现比查找表慢

我正在使用 BDS2006 可能会因 asm 指令的状态推送/弹出而减慢
您的查找很好，但我使用的是 11 位表而不是 8
它将 32 位分成 3 个分支而不是 4 个
并且它仍然足够小，可以在没有 init 函数的情况下处理

代码：

//---------------------------------------------------------------------------
DWORD log2_slow(const DWORD &x)
    {
    DWORD m,i;
    if (!x) return 0;
    if (x>=0x80000000) return 31;
    for (m=1,i=0;m<x;m<<=1,i++);
     if (m!=x) i--;
    return i;
    }
//---------------------------------------------------------------------------
DWORD log2_asm(const DWORD &x)
    {
    DWORD xx=x;
    asm {
        mov eax,xx
        bsr eax,eax;
        mov xx,eax;
        }
    return xx;
    }
//---------------------------------------------------------------------------
BYTE _log2[2048]=
    {
     0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
     7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
     8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
     8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    };
DWORD log2(const DWORD &x)
    {
         if (x>=0x00400000) return _log2[x>>22]+22;
    else if (x>=0x00000800) return _log2[x>>11]+11;
    else                    return _log2[x];
    }
//---------------------------------------------------------------------------

测试代码：

DWORD x,j,i,n=256;
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2     (j<<i); tend(); mm_log->Lines->Add(tstr(1));
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2_asm (j<<i); tend(); mm_log->Lines->Add(tstr(1));
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2_slow(j<<i); tend(); mm_log->Lines->Add(tstr(1));

我在 AMD A8-5500 3.2 GHz 上的结果：

[   0.040 ms] log2     (x) - 11bit lookup table
[   0.060 ms] log2_asm (x) - BSR
[   0.415 ms] log2_slow(x) - shift loop

笔记：

log2(0) -> 0 因为使用了 DWORDS，实际上它应该是 -inf
所有其他值对于所有功能都是正确的

score 2 · Accepted Answer

有相当多的答案提供了快速的近似方法，log2(int)但很少有 for log2(float)，所以这里有两个（给出的 Java 实现）同时使用查找表和尾数/位黑客：

快速准确的 log2(float)：

/**
 * Calculate the logarithm to base 2, handling special cases.
 */
public static float log2(float x) {

    final int bits = Float.floatToRawIntBits(x);
    final int e = (bits >> 23) & 0xff;
    final int m = (bits & 0x7fffff);

    if (e == 255) {
        if (m != 0) {
            return Float.NaN;
        }
        return ((bits >> 31) != 0) ? Float.NaN : Float.POSITIVE_INFINITY;
    }

    if ((bits >> 31) != 0) {
        return (e == 0 && m == 0) ? Float.NEGATIVE_INFINITY : Float.NaN;
    }

    return (e == 0 ? data[m >>> qm1] : e + data[((m | 0x00800000) >>> q)]);
}

笔记：

如果参数为 NaN 或小于零，则结果为 NaN。
如果参数是正无穷大，那么结果是正无穷大。
如果参数为正零或负零，则结果为负无穷大。

快速准确的 log2(float) （稍微快一点，不检查）：

/**
 * Calculate the logarithm using base 2. Requires the argument be finite and
 * positive.
 */
public static float fastLog2(float x) {
    final int bits = Float.floatToRawIntBits(x);
    final int e = (bits >> 23) & 0xff;
    final int m = (bits & 0x7fffff);
    return (e == 0 ? data[m >>> qm1] : e + data[((m | 0x00800000) >>> q)]);
}

第二种方法放弃了另一种方法中存在的检查，因此具有以下特殊情况：

如果参数为 NaN，则结果不正确。
如果参数是否定的，则结果不正确。
如果参数是正无穷大，那么结果是不正确的。
如果参数为正零或负零，则结果为负无穷大。

这两种方法都依赖于查找表data（以及变量q和qm1）。这些使用以下方法填充。n定义精度空间权衡。

static int q, qm1;
static float[] data;

/**
 * Compute lookup table for a given base table size.
 * 
 * @param n The number of bits to keep from the mantissa. Table storage =
 *          2^(n+1) * 4 bytes, e.g. 64Kb for n=13. Must be in the range
 *          0<=n<=23
 */
public static void populateLUT(int n) {

    final int size = 1 << (n + 1);

    q = 23 - n;
    qm1 = q - 1;
    data = new float[size];

    for (int i = 0; i < size; i++) {
        data[i] = (float) (Math.log(i << q) / Math.log(2)) - 150;
    }
}

populateLUT(12);
log2(6666); // = 12.702606

score 2 · Accepted Answer

另一个 log2(int) 函数：（不再是最快的）

    [StructLayout(LayoutKind.Explicit)]
    private struct ConverterStruct
    {
        [FieldOffset(0)] public int asInt;
        [FieldOffset(0)] public float asFloat;
    }

    public static int Log2(uint val)
    {
        ConverterStruct a;  a.asInt = 0; a.asFloat = val;
        return ((a.asInt >> 23 )+ 1) & 0x1F;
    }

注意：在浮点数中使用指数的灵感来自SPWorley 3/22/2009。谨慎使用生产代码，因为这在非小端架构上会失败。

如果你想要一些“字节顺序”安全的东西，那么请查看spender 5/3/2012。它也有零支持。

最快的是新的内置BitOperations.Log2(x)

以下是一些基准测试：（此处的代码：https ://github.com/SunsetQuest/Fast-Integer-Log2 ）

Function               Time1  Full-32-Bit  Zero?   FUNCTION                  
BitOperationsLog2        2        Yes      Yes     BitOperations.Log2(x);
LeadingZeroCount         2        Yes      Yes     31 - BitOperations.LeadingZeroCount(x);
Log2_SunsetQuest5        16       Yes      No      ((BitConverter.DoubleToInt64Bits(val)>>52)+1) & 0xFF;
Log2_WiegleyJ            17       Yes      Yes     ...
MostSigBit_spender       17       Yes      Yes     ...
Log2_SPWorley            17       Yes      Yes     ...
Log2_SunsetQuest4        18       Yes      No      ...
FloorLg2_Matthew_Watson  18       Yes      Yes     ...
Log2_SunsetQuest3        19       Yes      No      ...
Log2_SunsetQuest1        20       Yes      Yes     ...
Log2_HarrySvensson       20       Yes      Yes     ...
Log2_DanielSig           21       No       Yes     ...
HighestBitUnrolled_Kaz   25       Yes      Yes     ...
FloorLog2_SN17           36       Yes      Yes     ...
Log2_Papayaved           44       Yes      Yes     ...
GetMsb_user3177100       45       Yes      Yes     ...
Log2_Flynn1179           57       Yes      Yes     ...
Msb_Protagonist          63       Yes      Yes     ...
SomeOtherMethod          76       Yes      Yes     ...
Log2_SunsetQuest0        98       Yes      Yes     ...
Log2_SunsetQuest2        131      Yes      Yes     ...
SomeOtherMethod          202      Yes      Yes     ...
SomeOtherMethod          545      Yes      Yes     ...

Zero_Support    = Supports Neg Return on Zero
Full-32-Bit     = Supports full 32-bit (some just support 31 bits)
SomeOtherMethod = name of function/person left out on purpose
Benchmark notes: AMD Ryzen, Release, no-debugger attached, .net 6.0

score 2 · Accepted Answer

inline int fast_log2(register double x)
{ 
    return (reinterpret_cast<uint64_t&>(x) >> 52) - 1023;
};

score 1 · Accepted Answer

1

清洁可靠且快速！
（需要 .net core 3 或更高版本）

int val = BitOperations.Log2(x);

于 2020-11-29T20:33:02.383 回答

score 0 · Accepted Answer

（我没有做任何测量，所以这可能不匹配，但我认为用户 user9337139 的想法很巧妙，想在 C# 中尝试同样的方法——他是 C++）。

int Magnitude(byte)这是一个基于将字节值转换为浮点数并从IEEE 浮点数表示中提取指数的 C#函数。

    using System.Runtime.InteropServices;

    [StructLayout(LayoutKind.Explicit)]
    struct UnionWorker
    {
        [FieldOffset(0)]
        public int i;
        [FieldOffset(0)]
        public float f;
    }

    static int Magnitude(byte b)
    {
        UnionWorker u;
        u.i = 0; // just to please the compiler
        u.f = b;
        return Math.Max((u.i >> 23) & 0xFF, 126) - 126;
    }

返回 0 表示 0，8 表示 0xFF，其他值如您所料。

零是一个特例，所以我需要Math.Max夹子。我怀疑 user9337139 的解决方案可能有类似的问题。

请注意，这尚未针对字节顺序问题进行测试 - 警告购买者。

score 0 · Accepted Answer

    static byte FloorLog2(UInt16 value)
    {
        for (byte i = 0; i < 15; ++i)
        {
            if ((value >>= 1) < 1)
            {
                return i;
            }
        }
        return 15;
    }

c# - log2(int) 和 log2(float) 的最快实现

9 回答 9

快速准确的 log2(float)：

快速准确的 log2(float) （稍微快一点，不检查）：

Related

Reference