java - Using a cached hashCode() value in equals()

Question

For immutable class String; String :: hashCode computation will happen only once in life time of that object. So calling hashCode() after the first time is always just returning private int hash. No CPU will be wasted on computation.

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {    // **h != 0 in second time**
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

As we know the contract between hashcode and equals as

equal objects must produce the same hash code

So i believe below piece of code will improve the performance on string equals(). This may be redundant in hashmap, since hashmap already got the bucket based on hashcode, but this has good performance improvement on BIG List search.

//Does the below will improve equals performance on BIG LIST?
if (this.hashCode() != anObject.hashCode()) {
        return false;
}

Please comment your thoughts the below api.

public boolean equals(Object anObject) {
if (this == anObject) {
    return true;
}

if (this.hashCode() != anObject.hashCode()) {
    return false;
}

if (anObject instanceof String) {
    String anotherString = (String)anObject;
    int n = count;
    if (n == anotherString.count) {
    char v1[] = value;
    char v2[] = anotherString.value;
    int i = offset;
    int j = anotherString.offset;
    while (n-- != 0) {
        if (v1[i++] != v2[j++])
        return false;
    }
    return true;
    }
}
return false;
}

UPDATE

As mentioned by 'dasblinkenlight' there is some cpu cycles required only at the first time of hashcode() API is called.

Since Java is maintaining String Pool; and if application requires large String multiple time comparison other than at hashing; then we can go for Utility method like below.

public boolean StringCompare (String one, String two) {

     if (one == two) {
         return true;
     }

     if (one.hashCode() != two.hashCode()) {
        return false;
     }


    int n = one.count;
    if (n == two.count) {
    char v1[] = one.value;
    char v2[] = two.value;
    int i = one.offset;
    int j = two.offset;
    while (n-- != 0) {
        if (v1[i++] != v2[j++])
        return false;
    }
    return true;

}

score 8 · Accepted Answer

当您知道比较将在大多数情况下失败时，在您自己的代码中进行这样的检查以节省相等性检查并没有错，但是将其放入通用代码中可能会降低整体性能您的系统有两个原因：

第一次计算哈希码需要一些 CPU 周期；当equals在哈希容器的上下文之外调用方法时，计算哈希码所需的 CPU 周期将被浪费
当Strings 用作哈希容器中的键时，容器在进行相等性检查之前建立哈希码的相等性，因此equals()方法内部的哈希码比较变得多余。

score 5 · Accepted Answer

Java非常擅长优化，你不需要为它做微优化。您应该编写可维护和可读的代码，并在查看可能导致问题的源之后（以及在许多性能测试之后）查看性能。编写晦涩难懂或难以阅读的代码将来很可能会导致错误，因为您或其他人无法辨别为什么要以这种方式编写方法。

你有没有发现你.equals()是你表现的瓶颈？如果不是，我会说坚持使用更具可读性且将来不太可能引入错误的代码。您的情况下的性能差异很可能可以忽略不计，但您可以运行测试来比较两种实现。

score 3 · Accepted Answer

您的优化将需要对实际上彼此相等的 String 对象进行更多工作，因为您仍然必须迭代它们以确保它们相等。

合约要求 equals 对象产生相同的哈希码。反之则不然，因为产生相同哈希码的对象不一定相等（哈希冲突）。

java - Using a cached hashCode() value in equals()

UPDATE

3 回答 3

Related

Reference