14

最近,我对一些 Java 集合没有方法 size() 的恒定时间操作这一事实感到惊讶。

虽然我了解到集合的并发实现为了获得并发性(ConcurrentLinkedQueue、ConcurrentSkipListSet、LinkedTransferQueue 等中的大小为 O(n))而做出了一些妥协,但好消息是这在 API 文档中得到了正确记录。

让我担心的是某些集合的方法返回的视图上的方法大小的性能。例如,TreeSet.tailSet返回其元素大于或等于 fromElement 的支持集部分的视图。令我惊讶的是,在返回的 SortedSet 上调用 size 在时间上是线性的,即 O(n)。至少那是我设法从 OpenJDK 的源代码中挖掘出来的:在 TreeSet 中作为 TreeMap 的包装器实现,在 TreeMap 中,有一个 EntrySetView 类,其大小方法如下:

abstract class EntrySetView extends AbstractSet<Map.Entry<K,V>> {
    private transient int size = -1, sizeModCount;

    public int size() {
        if (fromStart && toEnd)
            return m.size();
        if (size == -1 || sizeModCount != m.modCount) {
            sizeModCount = m.modCount;
            size = 0;
            Iterator i = iterator();
            while (i.hasNext()) {
                size++;
                i.next();
            }
        }
        return size;
    }

    ....
}

这意味着第一次调用大小是 O(n),然后只要不修改支持映射,它就会被缓存。我无法在 API 文档中找到这个事实。更有效的实现将是 O(log n),在子树大小的缓存中进行内存权衡。由于为避免代码重复(TreeSet 作为 TreeMap 的包装器)而进行了此类权衡,因此我看不出出于性能原因不应制作它们的原因。

不管我对 TreeSet 的 OpenJDK 实现的(非常简短的)分析是对还是错,我想知道是否有关于许多此类操作的性能的详细而完整的文档,尤其是那些完全出乎意料的操作?

4

1 回答 1

3

例如,TreeSet.tailSet返回元素大于或等于 的支持集部分的视图fromElement。令我惊讶的是,size返回的调用SortedSet在时间上是线性的,即O(n).

对我来说这并不奇怪。考虑 javadoc 中的这句话:

“返回的集合是由这个集合支持的,所以返回集合中的变化会反映在这个集合中,反之亦然。”

Since the tail set is a dynamic view of the backing set, it follows that its size has to be calculated dynamically in practice. The alternative would require that when a change was made to the backing set, it would have to adjust the sizes of all extant tailset (and headset) views. That would make updates to the backing set more expensive, AND it would present a storage management problem. (In order to update the view sizes, the backing set would need references to all existing view sets ... and that is a potential hidden memory leak.)

Now you do have a point regarding the documentation. But in fact, the javadocs says nothing about the complexity of the view collections. And, indeed, it doesn't even document that TreeSet.size() is O(1)! In fact, it only documents the complexity of the add, remove and contains operations.


I would like to know is there a detailed and complete documentation on performance of many such operations especially ones which are completely unexpected?

AFAIK, No. Certainly, not from Sun / Oracle ...

于 2013-03-29T13:16:07.257 回答