2

我一直在运行一些测试,以了解内联函数代码(在代码本身中明确编写函数算法)如何影响性能。我将一个简单的字节数组写入整数代码,然后将其包装在一个函数中,从另一个类静态调用它,并从类本身静态调用它。代码如下:

public class FunctionCallSpeed {
    public static final int numIter = 50000000;

    public static void main (String [] args) {
        byte [] n = new byte[4];

        long start;

        System.out.println("Function from Static Class =================");
        start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            StaticClass.toInt(n);
        }
        System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");

        System.out.println("Function from Class ========================");
        start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            toInt(n);
        }
        System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");

        int actual = 0;

        int len = n.length;

        System.out.println("Inline Function ============================");
        start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            for (int j = 0; j < len; j++) {
                actual += n[len - 1 - j] << 8 * j;
            }
        }
        System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");
    }

    public static int toInt(byte [] num) {
        int actual = 0;

        int len = num.length;

        for (int i = 0; i < len; i++) {
            actual += num[len - 1 - i] << 8 * i;
        }

        return actual;
    }
}

结果如下:

Function from Static Class =================
Elapsed time: 0.096559931s
Function from Class ========================
Elapsed time: 0.015741711s
Inline Function ============================
Elapsed time: 0.837626286s

字节码有什么奇怪的地方吗?我自己看过字节码,但我不是很熟悉,我无法判断它的正面或反面。

编辑

我添加了assert语句来读取输出,然后随机读取读取的字节,基准测试现在按照我认为的方式运行。感谢 Tomasz Nurkiewicz,他向我指出了微基准测试文章。因此,生成的代码是:

public class FunctionCallSpeed {
public static final int numIter = 50000000;

public static void main (String [] args) {
    byte [] n;

    long start, end;
    int checker, calc;

    end = 0;
    System.out.println("Function from Object =================");
    for (int i = 0; i < numIter; i++) {
        checker = (int)(Math.random() * 65535);
        n = toByte(checker);
        start = System.nanoTime();
        calc = StaticClass.toInt(n);
        end += System.nanoTime() - start;
        assert calc == checker;
    }
    System.out.println("Elapsed time: " + (double)end / 1000000000 + "s");
    end = 0;
    System.out.println("Function from Class ==================");
    start = System.nanoTime();
    for (int i = 0; i < numIter; i++) {
        checker = (int)(Math.random() * 65535);
        n = toByte(checker);
        start = System.nanoTime();
        calc = toInt(n);
        end += System.nanoTime() - start;
        assert calc == checker;
    }
    System.out.println("Elapsed time: " + (double)end / 1000000000 + "s");


    int len = 4;
    end = 0;
    System.out.println("Inline Function ======================");
    start = System.nanoTime();
    for (int i = 0; i < numIter; i++) {
        calc = 0;
        checker = (int)(Math.random() * 65535);
        n = toByte(checker);
        start = System.nanoTime();
        for (int j = 0; j < len; j++) {
            calc += n[len - 1 - j] << 8 * j;
        }
        end += System.nanoTime() - start;
        assert calc == checker;
    }
    System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");
}

public static byte [] toByte(int val) {
    byte [] n = new byte[4];

    for (int i = 0; i < 4; i++) {
        n[i] = (byte)((val >> 8 * i) & 0xFF);
    }
    return n;
}

public static int toInt(byte [] num) {
    int actual = 0;

    int len = num.length;

    for (int i = 0; i < len; i++) {
        actual += num[len - 1 - i] << 8 * i;
    }

    return actual;
}
}

结果:

Function from Static Class =================
Elapsed time: 9.276437031s
Function from Class ========================
Elapsed time: 9.225660708s
Inline Function ============================
Elapsed time: 5.9512E-5s
4

4 回答 4

5

总是很难保证 JIT 正在做什么,但如果我不得不猜测,它会注意到函数的返回值从未被使用过,并对其进行了很多优化。

如果您实际使用函数的返回值,我敢打赌它会改变速度。

于 2012-08-30T16:43:06.307 回答
3

您有几个问题,但主要问题是您正在测试一个优化代码的一个迭代。这肯定会给你带来好坏参半的结果。我建议运行测试 2 秒,忽略前 10,000 次左右的迭代。

如果没有保留循环的结果,则可以在某个随机间隔后丢弃整个循环。

将每个测试分解为一个单独的方法

public class FunctionCallSpeed {
    public static final int numIter = 50000000;
    private static int dontOptimiseAway;

    public static void main(String[] args) {
        byte[] n = new byte[4];

        for (int i = 0; i < 10; i++) {
            test1(n);
            test2(n);
            test3(n);
            System.out.println();
        }
    }

    private static void test1(byte[] n) {
        System.out.print("from Static Class: ");
        long start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            dontOptimiseAway = FunctionCallSpeed.toInt(n);
        }
        System.out.print((System.nanoTime() - start) / numIter + "ns ");
    }

    private static void test2(byte[] n) {
        long start;
        System.out.print("from Class: ");
        start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            dontOptimiseAway = toInt(n);
        }
        System.out.print((System.nanoTime() - start) / numIter + "ns ");
    }

    private static void test3(byte[] n) {
        long start;
        int actual = 0;

        int len = n.length;

        System.out.print("Inlined: ");
        start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            for (int j = 0; j < len; j++) {
                actual += n[len - 1 - j] << 8 * j;
            }
            dontOptimiseAway = actual;
        }
        System.out.print((System.nanoTime() - start) / numIter + "ns ");
    }

    public static int toInt(byte[] num) {
        int actual = 0;

        int len = num.length;

        for (int i = 0; i < len; i++) {
            actual += num[len - 1 - i] << 8 * i;
        }

        return actual;
    }
}

印刷

from Class: 7ns Inlined: 11ns from Static Class: 9ns 
from Class: 6ns Inlined: 8ns from Static Class: 8ns 
from Class: 6ns Inlined: 9ns from Static Class: 6ns

这表明当单独优化内部循环时,它的效率会更高一些。

但是,如果我使用字节到 int 的优化转换

public static int toInt(byte[] num) {
    return num[0] + (num[1] << 8) + (num[2] << 16) + (num[3] << 24);
}

所有测试报告

from Static Class: 0ns from Class: 0ns Inlined: 0ns 
from Static Class: 0ns from Class: 0ns Inlined: 0ns 
from Static Class: 0ns from Class: 0ns Inlined: 0ns 

因为它意识到测试没有做任何有用的事情。;)

于 2012-08-30T16:43:33.323 回答
3

我将您的测试用例移植到caliper

import com.google.caliper.SimpleBenchmark;

public class ToInt extends SimpleBenchmark {

    private byte[] n;
    private int total;

    @Override
    protected void setUp() throws Exception {
        n = new byte[4];
    }

    public int timeStaticClass(int reps) {
        for (int i = 0; i < reps; i++) {
            total += StaticClass.toInt(n);
        }
        return total;
    }

    public int timeFromClass(int reps) {
        for (int i = 0; i < reps; i++) {
            total += toInt(n);
        }
        return total;
    }

    public int timeInline(int reps) {
        for (int i = 0; i < reps; i++) {
            int actual = 0;
            int len = n.length;
            for (int i1 = 0; i1 < len; i1++) {
                actual += n[len - 1 - i1] << 8 * i1;
            }
            total += actual;
        }
        return total;
    }

    public static int toInt(byte[] num) {
        int actual = 0;
        int len = num.length;
        for (int i = 0; i < len; i++) {
            actual += num[len - 1 - i] << 8 * i;
        }
        return actual;
    }
}

class StaticClass {
    public static int toInt(byte[] num) {
        int actual = 0;

        int len = num.length;

        for (int i = 0; i < len; i++) {
            actual += num[len - 1 - i] << 8 * i;
        }

        return actual;
    }

}

确实似乎内联版本是最慢的,而两个静态版本几乎相同(如预期的那样):

卡尺

原因很难想象。我可以想到两个因素:

  • 当代码块尽可能小且易于推理时,JVM 更擅长执行微优化。当函数被内联时,整个代码变得更加复杂,JVM 放弃了。功能更小toInt(),JIT 更聪明

  • 缓存局部性 - 不知何故 JVM 用两小块代码(循环和方法)而不是更大的代码块表现得更好

于 2012-08-30T17:21:06.927 回答
0

你的测试有缺陷。第二个测试受益于第一个测试已经运行。您需要在其自己的 JVM 调用中运行每个测试用例。

于 2012-08-30T17:07:51.003 回答