java - Java HashMap 如何处理具有相同哈希码的不同对象？

Question

根据我的理解，我认为：

两个对象具有相同的哈希码是完全合法的。
如果两个对象相等（使用 equals() 方法），则它们具有相同的哈希码。
如果两个对象不相等，则它们不能具有相同的哈希码

我对么？

现在如果是正确的，我有以下问题：HashMap内部使用对象的哈希码。那么如果两个对象可以有相同的hashcode，那么如何HashMap跟踪它使用了哪个key呢？

有人可以解释HashMap内部如何使用对象的哈希码吗？

score 376 · Accepted Answer

hashmap 是这样工作的（这有点简化，但它说明了基本机制）：

它有许多“桶”，用于存储键值对。每个桶都有一个唯一的数字——这就是标识桶的原因。当你把一个key-value对放入map中时，hashmap会查看key的hash code，并将pair存放在标识符为key的hash code的bucket中。例如：key的hash code是235 -> 对存储在235号桶中。（注意一个桶可以存储多于一个键值对）。

当你在 hashmap 中查找一个值时，通过给它一个键，它会首先查看你给出的键的哈希码。然后 hashmap 将查看相应的存储桶，然后将您提供的密钥与存储桶中所有对的密钥进行比较，并将它们与equals().

现在您可以看到这对于在映射中查找键值对非常有效：通过键的哈希码，哈希映射立即知道要在哪个桶中查找，因此它只需要针对该桶中的内容进行测试。

看上面的机制，也可以看出对keys的hashCode()和方法有什么要求：equals()

如果两个键相同（比较时equals()返回true），它们的hashCode()方法必须返回相同的数字。如果键违反了这一点，那么相等的键可能存储在不同的桶中，并且哈希映射将无法找到键值对（因为它会在同一个桶中查找）。
如果两个键不同，那么它们的哈希码是否相同都没有关系。如果它们的哈希码相同，它们将存储在同一个桶中，在这种情况下，哈希图将用于equals()区分它们。

score 95 · Accepted Answer

你的第三个断言是不正确的。

两个不相等的对象具有相同的哈希码是完全合法的。它被HashMap用作“第一次通过过滤器”，以便地图可以快速找到具有指定键的可能条目。然后测试具有相同哈希码的键与指定键的相等性。

您不希望要求两个不相等的对象不能具有相同的哈希码，否则这会将您限制为 2 ^{32 个}可能的对象。（这也意味着不同的类型甚至不能使用对象的字段来生成散列码，因为其他类可以生成相同的散列。）

score 83 · Accepted Answer

HashMap结构图

HashMap是一个Entry对象数组。

仅HashMap将其视为一组对象。

看看这Object是什么：

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        final int hash;
… 
}

每个Entry对象代表一个键值对。如果存储桶有多个，则该字段next引用另一个对象。EntryEntry

有时可能会发生 2 个不同对象的哈希码相同的情况。在这种情况下，两个对象将保存在一个桶中，并以链表的形式呈现。入口点是最近添加的对象。该对象引用另一个具有next字段的对象，依此类推。最后一项是指null.

当您HashMap使用默认构造函数创建

HashMap hashMap = new HashMap();

该数组的大小为 16，默认为 0.75 负载平衡。

添加新的键值对

计算密钥的哈希码
计算hash % (arrayLength-1)元素应该放置的位置（桶号）
如果您尝试使用已保存在中的键添加值HashMap，则值将被覆盖。
否则元素被添加到桶中。

如果桶已经有至少一个元素，则添加一个新元素并将其放置在桶的第一个位置。它的next字段指的是旧元素。

删除

计算给定键的哈希码
计算桶数hash % (arrayLength-1)
获取对桶中第一个 Entry 对象的引用，并通过 equals 方法遍历给定桶中的所有条目。最终我们会找到正确的Entry。如果没有找到想要的元素，返回null

score 38 · Accepted Answer

您可以在http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html找到极好的信息

总结：

HashMap 的工作原理是散列

put(key, value)： HashMap 将 key 和 value 对象都存储为 Map.Entry。Hashmap 应用 hashcode(key) 来获取桶。如果发生冲突，HashMap 使用 LinkedList 来存储对象。

get(key)： HashMap 使用 Key Object 的 hashcode 找出桶的位置，然后调用 keys.equals() 方法在 LinkedList 中识别正确的节点，并返回 Java HashMap 中该键的关联值对象。

score 23 · Accepted Answer

这是对HashMap's 机制的粗略描述，对于Java 8版本，（它可能与 Java 6 略有不同）。

数据结构

哈希表
哈希值是通过hash()键计算的，它决定哈希表的哪个桶用于给定的键。
链表 （单链表）
当桶中的元素数量很少时，使用单链表。
红黑树
当桶中的元素数量很大时，使用红黑树。

类（内部）

Map.Entry
表示映射中的单个实体，键/值实体。
HashMap.Node
节点的链表版本。

它可以代表：
- 一个哈希桶。
  因为它具有哈希属性。
- 单链表中的一个节点，（因此也是链表的头）。
HashMap.TreeNode
节点的树版本。

字段（内部）

Node[] table
桶表，（链表的头部）。
如果一个桶不包含元素，那么它是空的，因此只占用一个引用的空间。
Set<Map.Entry> entrySet 实体集。
int size
实体数量。
float loadFactor
指示在调整大小之前允许散列表的完整程度。
int threshold
要调整大小的下一个大小。
公式：threshold = capacity * loadFactor

方法（内部）

int hash(key)
按键计算哈希。

如何将哈希映射到存储桶？
使用以下逻辑：

static int hashToBucket(int tableSize, int hash) {
    return (tableSize - 1) & hash;
}

关于容量

在哈希表中，容量是指桶数，可以从table.length.
也可以通过thresholdand计算loadFactor，因此不需要定义为类字段。

可以通过以下方式获得有效容量： capacity()

运营

按键查找实体。
首先通过哈希值找到桶，然后循环链表或搜索排序树。
添加带有键的实体。
首先根据key的hash值找到bucket。
然后尝试找到值：
- 如果找到，请替换该值。
- 否则，在链表的开头添加一个新节点，或者插入到排序树中。
调整大小
当threshold达到时，会将哈希表的容量（table.length）加倍，然后对所有元素执行重新哈希以重建表。
这可能是一项昂贵的操作。

表现

get & put
时间复杂度为O(1)，因为：
- Bucket 通过数组索引访问，因此O(1).
- 每个桶中的链表长度较小，因此可以看作O(1)。
- 树的大小也是有限的，因为当元素数量增加时会扩展容量并重新散列，所以可以将其视为O(1)，而不是O(log N)。

score 15 · Accepted Answer

哈希码确定哈希映射要检查的存储桶。如果桶中有多个对象，则进行线性搜索以查找桶中的哪个项目等于所需的项目（使用equals()）方法。

换句话说，如果你有一个完美的哈希码，那么哈希图访问是不变的，你将永远不必遍历一个桶（从技术上讲，你还必须有 MAX_INT 个桶，Java 实现可能会在同一个桶中共享一些哈希码来减少空间需求）。如果您的哈希码最差（总是返回相同的数字），那么您的哈希图访问将变为线性，因为您必须搜索地图中的每个项目（它们都在同一个存储桶中）以获得您想要的。

大多数情况下，编写良好的哈希码并不完美，但其独特性足以让您或多或少地获得持续访问。

score 11 · Accepted Answer

你在第三点上错了。两个条目可以具有相同的哈希码但不相等。从 OpenJdk看一下 HashMap.get 的实现。您可以看到它检查哈希是否相等并且键是否相等。如果第三点是真的，那么就没有必要检查键是否相等。哈希码在键之前进行比较，因为前者是更有效的比较。

如果您有兴趣了解更多相关信息，请查看关于Open Addressing 冲突解决的 Wikipedia 文章，我相信这是 OpenJdk 实现使用的机制。该机制与其他答案之一提到的“桶”方法略有不同。

score 8 · Accepted Answer

import java.util.HashMap;

public class Students  {
    String name;
    int age;

    Students(String name, int age ){
        this.name = name;
        this.age=age;
    }

    @Override
    public int hashCode() {
        System.out.println("__hash__");
        final int prime = 31;
        int result = 1;
        result = prime * result + age;
        result = prime * result + ((name == null) ? 0 : name.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        System.out.println("__eq__");
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Students other = (Students) obj;
        if (age != other.age)
            return false;
        if (name == null) {
            if (other.name != null)
                return false;
        } else if (!name.equals(other.name))
            return false;
        return true;
    }

    public static void main(String[] args) {

        Students S1 = new Students("taj",22);
        Students S2 = new Students("taj",21);

        System.out.println(S1.hashCode());
        System.out.println(S2.hashCode());

        HashMap<Students,String > HM = new HashMap<Students,String > (); 
        HM.put(S1, "tajinder");
        HM.put(S2, "tajinder");
        System.out.println(HM.size());
    }
}

Output:

__ hash __

116232

__ hash __

116201

__ hash __

__ hash __

2

所以在这里我们看到，如果对象 S1 和 S2 有不同的内容，那么我们很确定我们重写的 Hashcode 方法将为这两个对象生成不同的 Hashcode(116232,11601)。现在因为有不同的哈希码，所以它甚至不会费心调用 EQUALS 方法。因为不同的 Hashcode 保证了对象中的不同内容。

    public static void main(String[] args) {

        Students S1 = new Students("taj",21);
        Students S2 = new Students("taj",21);

        System.out.println(S1.hashCode());
        System.out.println(S2.hashCode());

        HashMap<Students,String > HM = new HashMap<Students,String > (); 
        HM.put(S1, "tajinder");
        HM.put(S2, "tajinder");
        System.out.println(HM.size());
    }
}

Now lets change out main method a little bit. Output after this change is 

__ hash __

116201

__ hash __

116201

__ hash __

__ hash __

__ eq __

1
We can clearly see that equal method is called. Here is print statement __eq__, since we have same hashcode, then content of objects MAY or MAY not be similar. So program internally  calls Equal method to verify this. 


Conclusion 
If hashcode is different , equal method will not get called. 
if hashcode is same, equal method will get called.

Thanks , hope it helps.

score 6 · Accepted Answer

two objects are equal, implies that they have same hashcode, but not vice versa.

2 equal objects ------> they have same hashcode

2 objects have same hashcode ----xxxxx--> they are NOT equal

Java 8 update in HashMap-

you do this operation in your code -

myHashmap.put("old","old-value");
myHashMap.put("very-old","very-old-value");

so, suppose your hashcode returned for both keys "old" and "very-old" is same. Then what will happen.

myHashMap is a HashMap, and suppose that initially you didn't specify its capacity. So default capacity as per java is 16. So now as soon as you initialised hashmap using the new keyword, it created 16 buckets. now when you executed first statement-

myHashmap.put("old","old-value");

then hashcode for "old" is calculated, and because the hashcode could be very large integer too, so, java internally did this - (hash is hashcode here and >>> is right shift)

hash XOR hash >>> 16

so to give as a bigger picture, it will return some index, which would be between 0 to 15. Now your key value pair "old" and "old-value" would be converted to Entry object's key and value instance variable. and then this entry object will be stored in the bucket, or you can say that at a particular index, this entry object would be stored.

FYI- Entry is a class in Map interface- Map.Entry, with these signature/definition

class Entry{
          final Key k;
          value v;
          final int hash;
          Entry next;
}

now when you execute next statement -

myHashmap.put("very-old","very-old-value");

and "very-old" gives same hashcode as "old", so this new key value pair is again sent to the same index or the same bucket. But since this bucket is not empty, then the next variable of the Entry object is used to store this new key value pair.

and this will be stored as linked list for every object which have the same hashcode, but a TRIEFY_THRESHOLD is specified with value 6. so after this reaches, linked list is converted to the balanced tree(red-black tree) with first element as the root.

score 2 · Accepted Answer

每个 Entry 对象代表键值对。如果存储桶有多个条目，则字段 next 引用其他条目对象。

有时可能会发生 2 个不同对象的 hashCode 相同。在这种情况下，2 个对象将保存在一个存储桶中，并将显示为 LinkedList。入口点是最近添加的对象。该对象引用具有下一个字段的其他对象，等等。最后一个条目是指空值。当您使用默认构造函数创建 HashMap

创建的数组大小为 16，默认负载平衡为 0.75。

在此处输入图像描述

（来源）

score 1 · Accepted Answer

散列图的工作原理是散列

HashMap get(Key k) 方法在 key 对象上调用 hashCode 方法，并将返回的 hashValue 应用于它自己的静态哈希函数，以找到一个存储桶位置（支持数组），其中键和值以嵌套类的形式存储，称为 Entry (Map.入口）。因此，您从上一行得出结论，键和值都以 Entry object 的形式存储在存储桶中。所以认为只有值存储在桶中是不正确的，不会给面试官留下好印象。

每当我们在 HashMap 对象上调用 get(Key k) 方法时。首先它检查 key 是否为 null 。请注意，HashMap 中只能有一个空键。

如果 key 为 null ，则 Null 键始终映射到哈希 0，因此索引为 0。

如果 key 不为空，它会在 key 对象上调用 hashfunction，见上面方法中的第 4 行，即 key.hashCode() ，所以在 key.hashCode() 返回 hashValue 之后，第 4 行看起来像

            int hash = hash(hashValue)

现在，它将返回的 hashValue 应用到自己的散列函数中。

我们可能想知道为什么要使用 hash(hashValue) 再次计算哈希值。答案是它可以防御劣质散列函数。

现在，最终的哈希值用于查找存储 Entry 对象的存储桶位置。条目对象像这样存储在桶中（hash,key,value,bucketindex）

score 1 · Accepted Answer

我不会详细介绍 HashMap 的工作原理，但会举一个例子，这样我们就可以通过将 HashMap 与现实联系起来来记住它是如何工作的。

我们有 Key、Value、HashCode 和 bucket。

有一段时间，我们会将它们与以下内容联系起来：

桶 -> 一个社会
HashCode -> 社团地址（始终唯一）
价值 -> 社会中的一所房子
键 -> 住宅地址。

使用 Map.get(key) ：

Stevie 想去他朋友 (Josse) 的房子，他住在 VIP 协会的别墅里，让它成为 JavaLovers Society。Josse 的地址是他的 SSN（每个人都不同）。维护了一个索引，我们可以在其中根据 SSN 找出协会的名称。这个索引可以被认为是一种找出HashCode的算法。

SSN 协会名称
92313(Josse's) -- JavaLovers
13214 -- AngularJSLovers
98080 -- Java爱好者
53808 -- 生物学爱好者

这个 SSN(key) 首先给了我们一个 HashCode(来自索引表)，它只是 Society 的名称。
现在，多个房子可以在同一个社会中，所以HashCode可以是通用的。
假设，Society 对两栋房子来说是公用的，我们如何识别我们要去哪栋房子，是的，通过使用（SSN）密钥，它只不过是房子地址

使用 Map.put(key,Value)

这通过查找 HashCode 为该值找到合适的社会，然后存储该值。

我希望这会有所帮助，并且可以进行修改。

score 1 · Accepted Answer

Bearing in mind the explanations here for the structure of a hashmap, perhaps someone could explain the following paragraph on Baeldung :-

Java has several implementations of the interface Map, each one with its own particularities.

However, none of the existing Java core Map implementations allow a Map to handle multiple values for a single key.

As we can see, if we try to insert two values for the same key, the second value will be stored, while the first one will be dropped.

It will also be returned (by every proper implementation of the put(K key, V value) method):

Map<String, String> map = new HashMap<>();
assertThat(map.put("key1", "value1")).isEqualTo(null);
assertThat(map.put("key1", "value2")).isEqualTo("value1");
assertThat(map.get("key1")).isEqualTo("value2");

score 0 · Accepted Answer

0

于 2016-03-20T11:03:37.263 回答

score 0 · Accepted Answer

As it is said, a picture is worth 1000 words. I say: some code is better than 1000 words. Here's the source code of HashMap. Get method:

/**
     * Implements Map.get and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

So it becomes clear that hash is used to find the "bucket" and the first element is always checked in that bucket. If not, then equals of the key is used to find the actual element in the linked list.

Let's see the put() method:

  /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

It's slightly more complicated, but it becomes clear that the new element is put in the tab at the position calculated based on hash:

i = (n - 1) & hash here i is the index where the new element will be put (or it is the "bucket"). n is the size of the tab array (array of "buckets").

First, it is tried to be put as the first element of in that "bucket". If there is already an element, then append a new node to the list.