4

我看到JDK-8055494 : Add C2 x86 intrinsic for BigInteger::multiplyToLen() 方法已针对 8u40 更新进行了修复,但我没想到它会将 BigInteger 乘法速度提高 2 到 3 倍。

以下是在两个 JVM 更新上通过不同公式计算阶乘的基准测试结果:

8u31

[info] Benchmark                             (n)   Mode  Cnt       Score       Error   Units
[info] JavaFactorial.recursion              1000  thrpt    5      13.994 ±     0.175  ops/ms
[info] JavaFactorial.recursion             10000  thrpt    5       0.202 ±     0.054  ops/ms
[info] JavaFactorial.recursionPar           1000  thrpt    5      12.066 ±     8.011  ops/ms
[info] JavaFactorial.recursionPar          10000  thrpt    5       0.253 ±     0.055  ops/ms
[info] JavaFactorial.split                  1000  thrpt    5      18.255 ±     2.656  ops/ms
[info] JavaFactorial.split                 10000  thrpt    5       0.286 ±     0.063  ops/ms

8u40

[info] Benchmark                             (n)   Mode  Cnt       Score       Error   Units
[info] JavaFactorial.recursion              1000  thrpt    5      33.704 ±     0.445  ops/ms
[info] JavaFactorial.recursion             10000  thrpt    5       0.428 ±     0.199  ops/ms
[info] JavaFactorial.recursionPar           1000  thrpt    5      38.170 ±     0.433  ops/ms
[info] JavaFactorial.recursionPar          10000  thrpt    5       0.557 ±     0.030  ops/ms
[info] JavaFactorial.split                  1000  thrpt    5      46.447 ±    11.582  ops/ms
[info] JavaFactorial.split                 10000  thrpt    5       0.586 ±     0.154  ops/ms

我应该期望这种 JDK 改进可以更快地运行我的代码,还是这又是一个临时基准?

编辑

如何验证我的发现以证明这种加速是由于 C2 固有的?

测试功能代码:

@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
public class JavaFactorial {
    @Param({"10", "100", "1000", "10000"})
    public int n;

    private static ForkJoinPool pool = new ForkJoinPool();

    @Benchmark
    public BigInteger loop() {
        return n > 20 ? loop(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger recursion() {
        return n > 20 ? recursion(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger recursionPar() {
        return n > 20 ? recursePar(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger split() {
        return n > 180 ? split(n) : (n > 20 ? recursion(1, n) : BigInteger.valueOf(fastLoop(1, n)));
    }

    private long fastLoop(final int n1, int n2) {
        long p = n1;
        while (n2 > n1) {
            p = p * n2;
            n2--;
        }
        return p;
    }

    private BigInteger loop(int n1, final int n2) {
        final long l = Long.MAX_VALUE >> (32 - Integer.numberOfLeadingZeros(n2));
        long p = 1;
        BigInteger r = BigInteger.ONE;
        while (n1 <= n2) {
            if (p <= l) {
                p *= n1;
            } else {
                r = r.multiply(BigInteger.valueOf(p));
                p = n1;
            }
            n1++;
        }
        return r.multiply(BigInteger.valueOf(p));
    }

    private BigInteger recursion(final int n1, final int n2) {
        if (n2 - n1 < 65) {
            return loop(n1, n2);
        }
        final int nm = (n1 + n2) >> 1;
        return recursion(nm + 1, n2).multiply(recursion(n1, nm));
    }

    private BigInteger recursePar(final int n1, final int n2) {
        if (n2 - n1 < 700) {
            return recursion(n1, n2);
        }
        final int nm = (n1 + n2) >> 1;
        RecursiveTask<BigInteger> t = new RecursiveTask<BigInteger>() {
            protected BigInteger compute() {
                return recursePar(nm + 1, n2);
            }
        };
        if (ForkJoinTask.getPool() == pool) {
            t.fork();
        } else {
            pool.execute(t);
        }
        return recursePar(n1, nm).multiply(t.join());
    }

    private BigInteger loop2(int n1, final int n2) {
        final long l = Long.MAX_VALUE >> (32 - Integer.numberOfLeadingZeros(n2));
        long p = 1;
        BigInteger r = BigInteger.ONE;
        while (n1 <= n2) {
            if (p <= l) {
                p *= n1;
            } else {
                r = r.multiply(BigInteger.valueOf(p));
                p = n1;
            }
            n1 += 2;
        }
        return r.multiply(BigInteger.valueOf(p));
    }

    private BigInteger recursion2(final int n1, final int n2) {
        if (n2 - n1 < 65) {
            return loop2(n1, n2);
        }
        final int nm = ((n1 + n2) >> 1) | 1;
        return recursion2(nm, n2).multiply(recursion2(n1, nm - 2));
    }

    private BigInteger split(int n) {
        int i = 31 - Integer.numberOfLeadingZeros(n), s = -n, o = 1;
        BigInteger p = BigInteger.ONE, r = BigInteger.ONE;
        while (i >= 0) {
            int h = n >> i;
            int o1 = (h - 1) | 1;
            if (o < o1) {
                p = p.multiply(recursion2(o + 2, o1));
                r = r.multiply(p);
            }
            o = o1;
            s += h;
            i--;
        }
        return r.shiftLeft(s);
    }
}
4

2 回答 2

2

在调查了可用 JVM 选项列表后,我发现添加到 8u40 中的选项可以准确地打开/关闭所需的内在特性。

以下是 8u40 的结果,与-XX:-UseMultiplyToLenIntrinsic8u31 的结果非常接近:

    [info] JavaFactorial.recursion              1000  thrpt    5      12.521 ±      3.352  ops/ms
    [info] JavaFactorial.recursion             10000  thrpt    5       0.217 ±      0.069  ops/ms
    [info] JavaFactorial.recursionPar           1000  thrpt    5      14.268 ±      8.319  ops/ms
    [info] JavaFactorial.recursionPar          10000  thrpt    5       0.286 ±      0.015  ops/ms
    [info] JavaFactorial.split                  1000  thrpt    5      18.768 ±      4.321  ops/ms
    [info] JavaFactorial.split                 10000  thrpt    5       0.255 ±      0.076  ops/ms
于 2015-04-12T07:36:35.063 回答
1

这个基准肯定表明任何大量使用BigInteger乘法的代码都会受益。

您的问题的真正答案是为您自己的代码编写一个基准测试并在两者下运行8u318u40希望您的代码更快,因此对您的代码进行基准测试并对其进行测试。如果 JRE 代码或其他人的代码更快,则表明您的代码可能会受益,但您必须尝试一下。

于 2015-04-09T13:00:46.133 回答