1

我们正在对 Java 数据结构进行一些实证测试,并且得到了一些我们无法正确解释的结果。

例如,当我们测试 TreeSet 的 last-method 的时间要求应该是恒定的时,在 TreeSet 的大小超过 30 000 后,我们会遇到运行时间的颠簸。我们运行 last-method 时,treeSet 中每个大小的元素数量都会增加 1000 次然后取结果的中值。

相关代码为:

import java.io.IOException;
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.ArrayList;
import java.util.Collections;
import jxl.write.WriteException;

public class TestRunner {


public void test(Testable testeCase, String outputFileName, Integer... initArgs){
    ExcelWriter excel = null;
    try {
        excel = new ExcelWriter(outputFileName);
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();
    int measurementsPoints = 35;
    //calculate median to every dataset size

    for(int j = 0; j < measurementsPoints; j++){
        int testCount = 1000;
        long startTime; 
        long endTime; 
        //double sum = 0;
        ArrayList<Integer> results = new ArrayList<Integer>();


        for (int i = 0; i < testCount; i++) {

            //initialize tested data structure
            testeCase.initTestRun(initArgs);

            startTime = threadMxBean.getCurrentThreadCpuTime();
            // run tested method
            testeCase.runTestMethod();
            endTime = threadMxBean.getCurrentThreadCpuTime();

            results.add((int)(endTime - startTime));

        }
        Collections.sort(results);
        excel.addNumber(j, 5, new Double(initArgs[0]));
        excel.addNumber(j, 6, new Double(results.get(testCount / 2)));

        //increase the size of the data structure 10, 15, 20, 30, 40, 60, 80...
        if(j % 2 == 0){
            initArgs[0]  = (int)(initArgs[0]  * 1.5);
        }
        else{
            initArgs[0] = (int)(initArgs[0]  / 3 * 4);
        }


    }
    try {
        excel.write();
    } catch (WriteException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

import java.util.TreeSet;


public class TreeSetLastTest implements Testable {

private TreeSet<Integer> values;



@Override
public void initTestRun(Integer... integers) {
    Integer initialCapacity = integers[0];
    values = new TreeSet<Integer>();
    for(int i = Integer.MIN_VALUE; i < Integer.MIN_VALUE + initialCapacity; i++){
        values.add(i);
    }


}

@Override
public void runTestMethod() {
    values.last();
}

}

当 treeSet 中的元素数量在 10-30 000 个元素之间时,测量的中位数为 3000 ns。当 treeSet 的大小增加到 120 000 个元素时,测量的中值增加到 13 000 ns,然后当元素数量增加到超过一百万时保持不变。那么导致增加的原因可能是什么,或者时间单位太小以至于差异在现实中毫无意义。谢谢您的帮助。

4

1 回答 1

4

Well I assume it's worth an answer.

Your assumption that a TreeSet has O(1) last() is erroneous. First the documentation doesn't state anything of this kind and in fact a TreeSet in java is implemented using a TreeMapwhich is an implementation of a red-black tree.

A red-black tree is similar to an AVL-tree which may be better known in that it guarantees O(log n) for lookups, i.e. makes sure the tree doesn't degenerate into a linked list. Basically your last() lookup has O(log n) complexity so will get worse as it gets larger.

Presumably because of caching, maybe even paging effects you don't see the O(log n) in your benchmark directly.

That's similar to LinkedLists and arrays - in theory linked lists have lots of things going for them, in practice linked lists are one of the worst data structures you can use on modern CPUs. Constant factors do matter after all and memory access patterns are large constant factors.

于 2012-05-08T10:08:34.333 回答