21

考虑这个例子:

public static void main(final String[] args) {
    final List<String> myList = Arrays.asList("A", "B", "C", "D");
    final long start = System.currentTimeMillis();
    for (int i = 1000000; i > myList.size(); i--) {
        System.out.println("Hello");
    }
    final long stop = System.currentTimeMillis();
    System.out.println("Finish: " + (stop - start));
}

对比

public static void main(final String[] args) {
    final List<String> myList = Arrays.asList("A", "B", "C", "D");
    final long start = System.currentTimeMillis();
    final int size = myList.size();
    for (int i = 1000000; i > size; i--) {
        System.out.println("Hello");
    }
    final long stop = System.currentTimeMillis();
    System.out.println("Finish: " + (stop - start));
}

这会有什么不同吗?在我的机器上,第二个似乎执行得更快,但我不知道它是否真的准确。编译器会优化这段代码吗?如果循环条件是不可变对象(例如字符串数组),我可以认为他会这样做。

4

13 回答 13

25

如果你想测试这样的东西,你真的必须优化你的微基准来衡量你关心的东西。

首先,使循环便宜不可能跳过。计算总和通常可以解决问题。

其次,比较两个时间。

这里有一些代码可以做到这两点:

import java.util.*;

public class Test {

public static long run1() {
  final List<String> myList = Arrays.asList("A", "B", "C", "D");
  final long start = System.nanoTime();
  int sum = 0;
  for (int i = 1000000000; i > myList.size(); i--) sum += i;
  final long stop = System.nanoTime();
  System.out.println("Finish: " + (stop - start)*1e-9 + " ns/op; sum = " + sum);
  return stop-start;
}

public static long run2() {
  final List<String> myList = Arrays.asList("A", "B", "C", "D");
  final long start = System.nanoTime();
  int sum = 0;
  int limit = myList.size();
  for (int i = 1000000000; i > limit; i--) sum += i;
  final long stop = System.nanoTime();
  System.out.println("Finish: " + (stop - start)*1e-9 + " ns/op; sum = " + sum);
  return stop-start;
}

public static void main(String[] args) {
  for (int i=0 ; i<5 ; i++) {
    long t1 = run1();
    long t2 = run2();
    System.out.println("  Speedup = " + (t1-t2)*1e-9 + " ns/op\n");
  }
}

}

如果我们运行它,在我的系统上我们会得到:

Finish: 0.481741256 ns/op; sum = -243309322
Finish: 0.40228402 ns/op; sum = -243309322
  Speedup = 0.079457236 ns/op

Finish: 0.450627151 ns/op; sum = -243309322
Finish: 0.43534661700000005 ns/op; sum = -243309322
  Speedup = 0.015280534 ns/op

Finish: 0.47738474700000005 ns/op; sum = -243309322
Finish: 0.403698331 ns/op; sum = -243309322
  Speedup = 0.073686416 ns/op

Finish: 0.47729349600000004 ns/op; sum = -243309322
Finish: 0.405540508 ns/op; sum = -243309322
  Speedup = 0.071752988 ns/op

Finish: 0.478979617 ns/op; sum = -243309322
Finish: 0.36067492700000003 ns/op; sum = -243309322
  Speedup = 0.11830469 ns/op

这意味着方法调用的开销约为 0.1 ns。如果您的循环执行的时间不超过 1-2 ns,那么您应该关心这一点。否则,不要。

于 2010-03-05T01:01:22.020 回答
10

I once worked on a project where my first task was to track down some insanely slow code (it was on a brand new 486 machine and it took about 20 minutes to execute):

for(size_t i = 0; i < strlen(data); i++)
{
    // do something with data[i]
}

The solution was (got it down to something like two minutes or less):

size_t length = strlen(data);

for(int i = 0; i < length; i++)
{
    // do something with data[i]
}

The issue is that "data" was over 1 million characters, and strlen has to count each one all the time.

In the case of Java the "size()" method probably returns a variable, and as such, the VM will inline it. On a VM like the one on Android it probably does not. So the answer is "it depends".

My personal preference is to never call a method more than one time if it is supposed to return the same result each time. That way if the method does involve a calculation it is performed only one time and then it is never an issue.

于 2010-03-04T23:28:51.917 回答
10

就个人而言,我认为你不能从这样一个人为的例子中得出任何有意义的结论。

但是如果你真的想知道,为什么不使用 javap 反编译代码,看看有什么不同呢?当您无需在这里询问即可亲眼看到时,为什么还要猜测编译器在做什么呢?

第一种情况的字节码:

public class Stackoverflow extends java.lang.Object{
public Stackoverflow();
  Code:
   0:   aload_0
   1:   invokespecial   #1; //Method java/lang/Object."<init>":()V
   4:   return

public static void main(java.lang.String[]);
  Code:
   0:   iconst_4
   1:   anewarray       #2; //class java/lang/String
   4:   dup
   5:   iconst_0
   6:   ldc     #3; //String A
   8:   aastore
   9:   dup
   10:  iconst_1
   11:  ldc     #4; //String B
   13:  aastore
   14:  dup
   15:  iconst_2
   16:  ldc     #5; //String C
   18:  aastore
   19:  dup
   20:  iconst_3
   21:  ldc     #6; //String D
   23:  aastore
   24:  invokestatic    #7; //Method java/util/Arrays.asList:([Ljava/lang/Object;)Ljava/util/List
   27:  astore_1
   28:  invokestatic    #8; //Method java/lang/System.currentTimeMillis:()J
   31:  lstore_2
   32:  ldc     #9; //int 1000000
   34:  istore  4
   36:  iload   4
   38:  aload_1
   39:  invokeinterface #10,  1; //InterfaceMethod java/util/List.size:()I
   44:  if_icmple       61
   47:  getstatic       #11; //Field java/lang/System.out:Ljava/io/PrintStream;
   50:  ldc     #12; //String Hello
   52:  invokevirtual   #13; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   55:  iinc    4, -1
   58:  goto    36
   61:  invokestatic    #8; //Method java/lang/System.currentTimeMillis:()J
   64:  lstore  4
   66:  getstatic       #11; //Field java/lang/System.out:Ljava/io/PrintStream;
   69:  new     #14; //class java/lang/StringBuilder
   72:  dup
   73:  invokespecial   #15; //Method java/lang/StringBuilder."<init>":()V
   76:  ldc     #16; //String Finish:
   78:  invokevirtual   #17; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/la
   81:  lload   4
   83:  lload_2
   84:  lsub
   85:  invokevirtual   #18; //Method java/lang/StringBuilder.append:(J)Ljava/lang/StringBuilder;
   88:  invokevirtual   #19; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   91:  invokevirtual   #13; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   94:  return
}

第二种情况的字节码:

public class Stackoverflow extends java.lang.Object{
public Stackoverflow();
  Code:
   0:   aload_0
   1:   invokespecial   #1; //Method java/lang/Object."<init>":()V
   4:   return

public static void main(java.lang.String[]);
  Code:
   0:   iconst_4
   1:   anewarray       #2; //class java/lang/String
   4:   dup
   5:   iconst_0
   6:   ldc     #3; //String A
   8:   aastore
   9:   dup
   10:  iconst_1
   11:  ldc     #4; //String B
   13:  aastore
   14:  dup
   15:  iconst_2
   16:  ldc     #5; //String C
   18:  aastore
   19:  dup
   20:  iconst_3
   21:  ldc     #6; //String D
   23:  aastore
   24:  invokestatic    #7; //Method java/util/Arrays.asList:([Ljava/lang/Object;)Ljava/util/List;
   27:  astore_1
   28:  invokestatic    #8; //Method java/lang/System.currentTimeMillis:()J
   31:  lstore_2
   32:  aload_1
   33:  invokeinterface #9,  1; //InterfaceMethod java/util/List.size:()I
   38:  istore  4
   40:  ldc     #10; //int 1000000
   42:  istore  5
   44:  iload   5
   46:  iload   4
   48:  if_icmple       65
   51:  getstatic       #11; //Field java/lang/System.out:Ljava/io/PrintStream;
   54:  ldc     #12; //String Hello
   56:  invokevirtual   #13; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   59:  iinc    5, -1
   62:  goto    44
   65:  invokestatic    #8; //Method java/lang/System.currentTimeMillis:()J
   68:  lstore  5
   70:  getstatic       #11; //Field java/lang/System.out:Ljava/io/PrintStream;
   73:  new     #14; //class java/lang/StringBuilder
   76:  dup
   77:  invokespecial   #15; //Method java/lang/StringBuilder."<init>":()V
   80:  ldc     #16; //String Finish:
   82:  invokevirtual   #17; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   85:  lload   5
   87:  lload_2
   88:  lsub
   89:  invokevirtual   #18; //Method java/lang/StringBuilder.append:(J)Ljava/lang/StringBuilder;
   92:  invokevirtual   #19; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   95:  invokevirtual   #13; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   98:  return
}

存在差异,但我不确定我能否就它们对性能的影响做出明确的陈述。

我会编写第二个代码,因为这意味着(从表面上看)一个方法调用,而不是每个循环迭代一个。我不知道编译器是否可以优化它,但我确信我可以很容易地做到这一点。所以我这样做,不管它对墙上时间的影响。

于 2010-03-04T23:18:15.633 回答
5

请注意,javac编译器与优化无关。“重要”编译器是存在于 JVM 中的 JIT 编译器。

在您的示例中,在最通用的情况下, themyList.size()是一个简单的方法调度,它返回List实例中字段的内容。与所暗示的相比,这是微不足道的工作System.out.println("Hello")(至少一个系统调用,因此数百个时钟周期,而方法调度不超过十几个)。我非常怀疑您的代码是否会在速度上表现出有意义的差异。

在更一般的基础上,JIT 编译器应该将此调用识别size()为对已知实例的调用,以便它可以通过直接函数调用(更快)执行方法分派,甚至内联size()方法调用,从而减少调用一个简单的实例字段访问。

于 2010-03-04T23:18:41.223 回答
2

它无法优化它,因为 mylist.size() 可能会在循环执行期间发生变化。即使它是最终的,这仅意味着引用是最终的(意味着您不能将 myList 重新分配给其他对象),但 myList 上的方法,例如 remove() 和 add() 仍然可用。Final 不会使对象不可变。

于 2010-03-04T23:18:49.397 回答
1

Java compiler would have optimized it so, but didn't do so by seeing the funny condition. If you wrote it like this there would be no issue.

for (int i = myList.size(); i < 1000000; i--) {
    System.out.println("Hello");
}
于 2010-03-04T23:28:42.483 回答
1

几乎可以肯定,您在这里看到的是 HotSpot 内联的不同之处。使用更简单的循环,它更有可能内联,从而摆脱所有多余的垃圾。它可能会执行相同的内联,但更早或更早地执行此操作或花费更少的精力。通常使用 Java 微基准测试,您应该多次运行代码,从中可以计算出启动时间、平均时间和偏差。

于 2010-03-04T23:46:55.877 回答
1

第二个应该更快,因为.size()不必在每次执行循环时都调用。说一次 1+2=3 比说多次要快得多。

于 2010-03-04T23:18:15.257 回答
1

第二种实现更快是有道理的,因为您存储了变量的单个最终本地副本。编译器必须弄清楚循环内的大小不能改变,以使性能大致相等。

一个问题是——这种微优化真的重要吗?如果是这样,请使用测试中运行速度更快且不依赖编译器优化的内容。

于 2010-03-04T23:19:54.933 回答
0

With the last example you will not need to resolve the current size of the array so it will be slightly faster then the first example.

Just remember that this is only useful if you don't change the number of values in your array.

In Android it is recommended to use the latest example in there example, Designing for Performance. http://developer.android.com/guide/practices/design/performance.html#foreach

于 2010-03-04T23:29:11.683 回答
0

在“编译器优化”的情况下,你能做的最好的就是 for-each 循环:

for(final String x : myList) { ... }

这让编译器提供了最快的实现。

编辑:

您的代码示例之间的区别在于 for 循环的第二个参数。在第一个示例中,VM 将执行方法调用(更昂贵),因此速度较慢(仅在有大量迭代时才有意义)。在您的第二个示例中,VM 将执行堆栈弹出(成本较低,并且局部变量在堆栈上),因此速度更快(仅在有很多迭代时才有意义:对于一次迭代,第一个更快,在内存使用方面)。

另外:“过早的优化是万恶之源。” Donald Knuth 臭名昭著的法律。

于 2010-03-04T23:13:36.763 回答
0

不同之处在于每次迭代的方法调用更少,因此第二个版本应该运行得稍微快一些。尽管如果您使用 Just-In-Time 编译器,他可能会优化它 - 弄清楚它在循环期间不会改变。标准 Java 实现具有 JIT 特性,但并非每个 Java 实现都具有 JIT 特性。

于 2010-03-04T23:15:29.963 回答
0

与往常一样,您将不得不同时运行它们以查看给定您正在使用的实现哪个更快。但是,第一个具有潜在的性能损失,即每次迭代都必须调用 size() ,并且函数调用比简单地直接检查变量更昂贵。但是,根据您的代码和编译器的功能,该函数调用可能会被优化掉,因此您必须运行测试才能看到。

但是,正如 Pindatjuh 所指出的,当您要像这样迭代整个集合时,最好使用 foreach 循环。它应该让编译器更好地优化事物并且不易出错。

于 2010-03-04T23:19:39.130 回答