java - java中循环的性能（有vs没有bitshift，for vs.while）

Question

我刚刚用 Java 中的循环做了一个小测试。我假设 Java 中的位移速度通常比默认的整数增量更快。所以这是我的示例代码：

final int n = 16;
long n1 = System.nanoTime();
for (int i = 1; i < 1 << n; i <<= 1) {
    // nothing
}
long n2 = System.nanoTime();
for (int i = 0; i < n; i++) {
    // nothing
}
long n3 = System.nanoTime();
System.out.println("with shift = " + (n2 - n1) + " ns");
System.out.println("without shift = " + (n3 - n2) + " ns");

所以我的想法是，n1 和 n2 之间的时间会小于 n2 和 n3 之间的时间。但是每次运行此代码段时，整数增量似乎都更快。这是上面代码的输出：

with shift = 2445 ns
without shift = 1885 ns

with shift = 2374 ns
without shift = 1886 ns

with shift = 2374 ns
without shift = 1607 ns

有人可以解释一下这种行为吗？答案是 JVM 编译这段代码的方式，还是基于底层架构？

Ubuntu Linux 3.5.0-17-generic i686 GNU/Linux
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Pentium(R) Dual-Core CPU       T4300  @ 2.10GHz
stepping    : 10
microcode   : 0xa07
cpu MHz     : 1200.000
cache size  : 1024 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fdiv_bug    : no
hlt_bug     : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm dtherm
bogomips    : 4189.42
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Pentium(R) Dual-Core CPU       T4300  @ 2.10GHz
stepping    : 10
microcode   : 0xa07
cpu MHz     : 1200.000
cache size  : 1024 KB
physical id : 0
siblings    : 2
core id     : 1
cpu cores   : 2
apicid      : 1
initial apicid  : 1
fdiv_bug    : no
hlt_bug     : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm dtherm
bogomips    : 4189.42
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

========== 编辑 ================

好的，所以我更新了我的代码以获得更好的测量结果。

我的JVM：

java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) Server VM (build 20.12-b01, mixed mode)

新代码：

// amount of shifts
final int n = 16;
// recorded times
long n1 = 0, n2 = 0, n3 = 0, n4 = 0, n5 = 0;
// measured times
long withShiftFor = Long.MAX_VALUE;
long withoutShiftFor = Long.MAX_VALUE;
long withShiftWhile = Long.MAX_VALUE;
long withoutShiftWhile = Long.MAX_VALUE;
// instance to operate with
boolean b = true;
// do some loops to measure a better result
for (int x = 0; x < 2000000; x++) {
    // for loop with shift
    n1 = System.nanoTime();
    for (int i = 1; i < 1 << n; i <<= 1) {
        b = !b;
    }
    // for loop wihtout shift
    n2 = System.nanoTime();
    for (int i = 0; i < n; i++) {
        b = !b;
    }
    // while loop with shift
    n3 = System.nanoTime();
    int i = 1;
    while (i < 1 << n) {
        b = !b;
        i <<= 1;
    }
    // while loop without shift
    n4 = System.nanoTime();
    int j = 0;
    while (j < n) {
        b = !b;
        j++;
    }
    n5 = System.nanoTime();
    // take minimal time to save best result
    withShiftFor = Math.min(withShiftFor, n2 - n1);
    withoutShiftFor = Math.min(withoutShiftFor, n3 - n2);
    withShiftWhile = Math.min(withShiftWhile, n4 - n3);
    withoutShiftWhile = Math.min(withoutShiftWhile, n5 - n4);
}
System.out.println("for with shift = " + withShiftFor + " ns");
System.out.println("for without shift = " + withoutShiftFor + " ns");
System.out.println("while with shift = " + withShiftWhile + " ns");
System.out.println("while without shift = " + withoutShiftWhile + " ns");

3 次运行后的新输出（每次运行时间超过 5 秒）：

for with shift = 907 ns
for without shift = 838 ns
while with shift = 907 ns
while without shift = 907 ns

for with shift = 907 ns
for without shift = 907 ns
while with shift = 907 ns
while without shift = 907 ns

for with shift = 907 ns
for without shift = 838 ns
while with shift = 907 ns
while without shift = 907 ns

所以你是对的，经过几秒钟和多次迭代后，结果几乎相同。但是为什么 for 循环没有比其他解决方案更快地移动呢？jvm 是否有任何优化，尽管您提到了 1 行增量而不是 4 行移位？为什么递增的 while 与其他循环一样快？

score 2 · Accepted Answer

有人可以解释一下这种行为吗？答案是 JVM 编译这段代码的方式，还是基于底层架构？

当您运行短循环时，将解释代码。因此，如果您不打算经常运行代码或者您无法预热代码，那么您应该对此进行基准测试并期待像您所拥有的那样奇怪的结果。

如果要比较已编译/优化的代码，则应忽略前 10K 到 20K 循环，因为循环需要迭代 10K 时间才能默认编译（然后在后台编译，这需要一点时间）

无论如何，我还建议运行测试至少 2 秒以减少变化。

你的循环没有做任何事情，我希望 JIT 能够消除它们，你最终只需要计算 System.nanoTime() 所需的时间，这可能会增加 40 - 1000 ns，具体取决于系统。

score 1 · Accepted Answer

移动一个数字需要 4 个字节码，而递增只需要 1 个字节码。正如 Peter Lawrey 所说，JIT 编译器稍后可能会改变这一点。

java - java中循环的性能（有vs没有bitshift，for vs.while）

2 回答 2

Related

Reference