2

我只是做了一个基准来比较局部变量、成员变量、其他对象的成员变量和 getter setter 的访问性能。基准测试以 10 次 mio 迭代循环增加变量。这是输出:

BENCHMARK:本地101,成员1697,外来成员151,getter setter 268

这是在摩托罗拉 XOOM 平板电脑和 Android 3.2 上完成的。这些数字是执行时间的毫秒数。任何人都可以向我解释成员变量的偏差吗?尤其是与其他对象的成员变量相比时。基于这些数字,在计算中使用它们的值之前,将成员变量复制到局部变量似乎是值得的。顺便说一句,我在 HTC One X 和 Android 4.1 上做了同样的基准测试,结果显示出同样的偏差。

这些数字是合理的还是我错过了系统错误?

这是基准函数:

private int mID;

public void testMemberAccess() {
    // compare access times for local variables, members, members of other classes
    // and getter/setter functions
    final int numIterations = 10000000;
    final Item item = new Item();
    int i = 0;

    long start = SystemClock.elapsedRealtime(); 
    for (int k = 0; k < numIterations; k++) {
        mID++;
    }
    long member = SystemClock.elapsedRealtime() - start;

    start = SystemClock.elapsedRealtime(); 
    for (int k = 0; k < numIterations; k++) {
        item.mID++;
    }
    long foreignMember = SystemClock.elapsedRealtime() - start;

    start = SystemClock.elapsedRealtime(); 
    for (int k = 0; k < numIterations; k++) {
        item.setID(item.getID() + 1);

    }
    long getterSetter = SystemClock.elapsedRealtime() - start;

    start = SystemClock.elapsedRealtime(); 
    for (int k = 0; k < numIterations; k++) {
        i++;
    }
    long local = SystemClock.elapsedRealtime() - start;

    // just make sure nothing loops aren't optimized away?
    final int dummy = item.mID + i + mID;  
    Log.d(Game.ENGINE_NAME, String.format("BENCHMARK: local %d, member %d, foreign member %d, getter setter %d, dummy %d",
            local, member, foreignMember, getterSetter, dummy));
}

编辑:
我将每个循环放在一个函数中并随机调用它们 100 次。结果:BENCHMARK:本地 100,成员 168,外国成员 190,getter setter 271 看起来不错,谢谢。外来对象是作为最终类成员创建的,而不是在函数内部。

4

2 回答 2

1

Well, I'd say that the Dalvik VM's optimizer is pretty smart ;-) I do know that the Dalvik VM is register-based. I don't know the guts of the Dalvik VM, but I would assume that the following is going on (more or less):

In the local case, you are incrementing a method local variable inside a loop. The optimizer recognizes that this variable isn't accessed until the loop is completed, so can use a register and applies the increments there until the loop is complete and then stores the value back into the local variable. This yields: 1 fetch, 10000000 register increments and 1 store.

In the member case, you are incrementing a member variable inside a loop. The optimizer cannot determine whether or not the member variable is accessed while the loop is running (by another method, object or thread), so it is forced to fetch, increment and store the value back into the member variable on each loop iteration. This yields: 10000000 fetches, 10000000 increments and 10000000 store operations.

In the foreign member case, you are incrementing a member variable of an object inside a loop. You have created that object within the method. The optimizer recognizes that this object cannot be accessed (by another object, method or thread) until the loop is completed, so can use a register and apply the increments there until the loop is complete and then store the value back into the foreign member variable. This yields: 1 fetch, 10000000 register increments and 1 store.

In the getter/setter case, I am going to assume that the compiler and/or optimizer is smart enough to "inline" getter/setters (ie: it doesn't really make a method call - it replaces item.setID(item.getID() + 1) with item.mID = item.mID + 1). The optimizer recognizes that you are incrementing a member variable of an object inside a loop. You have created that object within the method. The optimizer recognizes that this object cannot be accessed (by another object, method or thread) until the loop is completed, so it can use a register and apply the increments there until the loop is complete and then store the value back into the foreign member variable. This yields: 1 fetch, 10000000 register increments and 1 store.

I can't really explain why the getter/setter timing is twice the foreign member timing, but this may be due to the time it takes the optimizer to figure it out, or something else.

An interesting test would be to move the creation of the foreign object out of the method and see if that changes anything. Try moving this line:

final Item item = new Item();

outside of the method (ie: declare it as a private member variable of some object instead). I would guess that the performance would be much worse.

Disclaimer: I'm not a Dalvik engineer.

于 2013-02-13T20:45:55.203 回答
0

除了改变它们的顺序之外,您还可以做其他事情来尝试消除任何干扰:

1-通过第二次计算第一项来消除边界效应;最好使用另一个长变量。

2-将迭代次数增加 10。1000000 似乎是一个很大的数字,但从第一个建议中可以看出;在现代 CPU 上将变量增加 100 万倍是如此之快,以至于填充各种缓存等许多其他事情都变得非常重要。

3-添加虚假指令,如插入虚拟long l = SystemClock.elapsedRealtime()-start计算。这将有助于表明这 1000000 次迭代确实是一个很小的数字。

4-将volatile关键字添加到该mID字段。这可能是排除任何编译器或 CPU 相关优化的最佳方式。

于 2013-02-14T07:33:53.523 回答