2

我尝试编写一个小程序来演示java中只有equals而不是hashcode()方法时的哈希冲突。这是为了证明两个不相等的对象可以具有相同的哈希码的理论。这是针对询问行为的面试问题。

我创建了 200,000 个对象,将它们存储在一个数组中,然后比较它们以查看哪些是重复的。(为此,我在对象创建阶段之后使用嵌套的 for 循环迭代对象数组。)对于大约 200,000 个对象,我得到 9 次碰撞。第一个是索引 196 和 121949 处的对象。然后我继续打印这些哈希码以显示两个值相同。

但是,我得到了一些非常令人惊讶的行为。如果我遍历嵌套的 for 循环并打印哈希码的第一次冲突,我会得到相同的哈希码值

1867750575
1867750575 

对于索引 196 和 121949 处的对象。

但是,如果我注释掉用于检测所有冲突的嵌套 for 循环并直接打印索引 196 和 121949 处元素的哈希码,我会得到

1829164700
366712642

请注意,我没有评论这些元素的创建,只是我检查碰撞的部分。

为什么会发生这种情况,即使我不迭代它们,哈希码不应该是一致的吗?

附录1:据我所知,这背后是否有来源,按照生日原则,如果我创建200,000个对象,我必须发生冲突,如何迭代每个hascode或不改变任何东西?

附录 2:我尝试添加另一个 200000 大小的数组只是为了查看碰撞索引是否发生变化,但它们没有发生变化,因此显然在未提交循环的情况下对二进制文件进行更改不会做出任何更改。所以改变二进制改变哈希​​码的假设不成立。

这是我的代码

import java.util.HashMap;

public class EmployeeFactory {

    private static int counter = 0;
    public int id;
    public String empName;

    EmployeeFactory() {
        id = counter;
        empName = "employee_" + id;
        counter++;
    }

    @Override
    public boolean equals(Object o) {

        // If the object is compared with itself then return true
        if (o == this) {
            return true;
        }

        if (o == null || o.getClass() != this.getClass()) {
            return false;
        }

        EmployeeFactory emp = (EmployeeFactory) o;

        // Compare the data members and return accordingly
        return this.id == emp.id;
    }

    public static void main(String[] args) {

        int Obj_Count = 200000;

        EmployeeFactory objs[] = new EmployeeFactory[Obj_Count];
        for (int i = 0; i < Obj_Count; ++i) {
            EmployeeFactory temp = new EmployeeFactory();
            objs[i] = temp;
        }


//Please try code once un commenting the loop below and once while keeping it commented.
 /*   
        for (int i = 0; i < Obj_Count; ++i)
        {
            for (int j = i + 1; j < Obj_Count; ++j)
            {
                if (objs[i].hashCode() == objs[j].hashCode())
                {
                    System.out.println("Objects with IDs " + objs[i].id
                                     + " and " + objs[j].id + " collided.");
                    System.out.println("Object Is " + i + "and Obj ID is "+ objs[i].id + " Has Hashcode " + objs[i].hashCode());
                    System.out.println("Object Is " + j + "and Obj ID is "+ objs[j].id + " Has Hashcode " + objs[j].hashCode());
                    System.out.println("");
                }
            }
        }
        */

        HashMap<EmployeeFactory, EmployeeFactory> hm = new HashMap<EmployeeFactory, EmployeeFactory>();
        objs[121949].id = objs[196].id;
        hm.put(objs[196], objs[196]);
        hm.put(objs[121949], objs[121949]);
        System.out.println(hm.get(objs[121949]).empName);
        System.out.println(hm.get(objs[196]).empName);

        // checking the hashmap
        System.out.println(hm.get(objs[121949]).hashCode());
        System.out.println(hm.get(objs[196]).hashCode());

        // Checking the array
        System.out.println(objs[121949].hashCode());
        System.out.println(objs[196].hashCode());

    }

}

评论输出:

employee_121949
employee_196
1829164700
366712642
1829164700
366712642

未注释的循环输出

Objects with IDs 196 and 121949 collided.
Object Is 196and Obj ID is 196 Has Hashcode 1867750575
Object Is 121949and Obj ID is 121949 Has Hashcode 1867750575

Objects with IDs 62082 and 145472 collided.
Object Is 62082and Obj ID is 62082 Has Hashcode 2038112324
Object Is 145472and Obj ID is 145472 Has Hashcode 2038112324

Objects with IDs 62354 and 105841 collided.
Object Is 62354and Obj ID is 62354 Has Hashcode 2134400190
Object Is 105841and Obj ID is 105841 Has Hashcode 2134400190

Objects with IDs 68579 and 186938 collided.
Object Is 68579and Obj ID is 68579 Has Hashcode 1872358815
Object Is 186938and Obj ID is 186938 Has Hashcode 1872358815

Objects with IDs 105219 and 111288 collided.
Object Is 105219and Obj ID is 105219 Has Hashcode 651156501
Object Is 111288and Obj ID is 111288 Has Hashcode 651156501

Objects with IDs 107634 and 152385 collided.
Object Is 107634and Obj ID is 107634 Has Hashcode 273791087
Object Is 152385and Obj ID is 152385 Has Hashcode 273791087

Objects with IDs 108007 and 146405 collided.
Object Is 108007and Obj ID is 108007 Has Hashcode 1164664992
Object Is 146405and Obj ID is 146405 Has Hashcode 1164664992

Objects with IDs 135275 and 180997 collided.
Object Is 135275and Obj ID is 135275 Has Hashcode 996371445
Object Is 180997and Obj ID is 180997 Has Hashcode 996371445

Objects with IDs 153749 and 184310 collided.
Object Is 153749and Obj ID is 153749 Has Hashcode 254720071
Object Is 184310and Obj ID is 184310 Has Hashcode 254720071

employee_121949
employee_121949
1867750575
1867750575
1867750575
1867750575
4

2 回答 2

4

当你不覆盖hashCode()时,你会得到继承自的身份哈希码函数class Object

身份哈希码取决于你看不到的东西,理论上每次运行程序时都会改变,比如对象在内存中的最终位置,在你之前创建的对象数量等。你不能指望有程序或方法的不同运行之间的身份哈希值的任何一致性。

但是,如果您两次运行完全相同的程序,并且它不是太大,那么两次最终得到相同哈希值的可能性非常大。但是,如果您更改程序,则会更改加载和编译类将消耗多少内存,这很可能会通过更改对象将要到达的内存位置来更改身份哈希。

于 2017-01-13T13:07:38.117 回答
1

Matt Timmermans 的回答很好地涵盖了基本问题,特别是“你不能指望在不同的运行之间有任何一致性......”。(+1)

目前,在 Hotspot 中,默认Object.hashCode()实现(也称为身份哈希码,因为它与 相同System.identityHashCode(obj))只是一个带有线程本地种子的伪随机数。相当长一段时间以来,对象的内存地址都没有任何依赖关系。如果您的程序执行是完全确定的,那么哈希很可能是可重复的。

另请注意,身份哈希码是在第一次调用Object.hashCode()or时延迟生成的,System.identityHashCode()并且该值存储在对象中,因此对该对象的后续调用将返回相同的值。如果您在另一个线程中运行碰撞检测器循环,您将获得完全不同的哈希值,从而产生不同的碰撞。

于 2017-01-13T23:19:05.167 回答