java - JVM optimisation of hashCode() on List

Question

Imagine a simple case:

class B{
    public final String text;
    public B(String text){
        this.text = text;
    }
}

class A {
    private List<B> bs = new ArrayList<B>;

    public B getB(String text){
        for(B b :bs){
           if(b.text.equals(text)){
               return b;
           }
        }
        return null;
    }

    [getter/setter]
}

Imagine that for each instance of A, the List<B> is large and we need to call getB(String) often. However assume that it is also possible for the list to change (add/remove element, or even being reassigned).

At this stage, the average complexity for getB(String) is O(n). In order to improved that I was wondering if we could use some clever caching.

Imagine we cache the List<B> in a Map<String, B> where the key is B.text. That would improve the performance but it won't work if the list is changed (new element or deleted element) or reassigned (A.bs points to a new reference).

To go around that I thought that, along with the Map<String, B>, we could store a hash of the list bs. When we call getB(String) method, we compute the hash of the list bs. If the hash hasn't changed, we fetch the result from the map, if it has we reload the map.

The problem is that computing the hash for a java.util.List goes through all the element of the list and computes their hash, which is at least O(n).

Question

What I'd like to know is whether the JVM will be faster at computing the hash for the List than going through my loop in the getB(String) method. May be that depends on the implementation of hash for B. If so what kind of things could work? In a nutshell, I'd like to know whether this is stupid or could bring some performance improvement.

score 5 · Accepted Answer

Without actually explaining why, you seem for some reason to believe that it is essential to keep the list structure as well. The only reasonable reason for this is that you need the order of the collection to be kept consistent. If you switch to a "plain" map, the order of the values is no longer constant, e.g. kept in the order in which you add the items to the map.

If you need both to keep the order (list behaviour) and access individual items using a key, you can use a LinkedHashMap, which essentially joins the behaviour of a LinkedList and a HashMap. Even if LinkedHashMap.values() returns a collection and not a list, the list behaviour is guaranteed within the collection.

Another issue with your question is, that you cannot use the list's hash code to safely determine changes. If the hash code has changed, you are indeed sure that the list has changed as well. If two hash codes are identical, you can still not be sure that the lists are actually identical. E.g. if the hash code implementation is based on strings, the hash codes for "1a" and "2B" are identical.

score 1 · Accepted Answer

If so what kind of things could work?

Simply put: don't let anything else mutate your list without you knowing about it. I suspect you currently have something like:

public List<String> getAllBs() {
    return bs;
}

... and a similar setter. If you stop doing that, and instead just have appropriate mutation methods, then you can make sure that your code is the only code to mutate the list... which means you can either remember that your map is "dirty" or just mutate the map at the same time that you mutate the list.

score 1 · Accepted Answer

You could implement your own class IndexedBArrayList which extends ArrayList<B>.

Then you add this functionality to it:

A private HashMap<String, B> index
All mutator methods of ArrayList are overridden to keep this index hash map updated in addition to calling the corresponding super-method.
A new public B getByString(String) method which uses the hash map

score 0 · Accepted Answer

From your description it does not seem that you need a List<B>.
Replace the List with a HashMap. If you need to search for Bs the best data structure is the hashmap and not the list.

java - JVM optimisation of hashCode() on List

4 回答 4

Related

Reference