java - 哈希表中的关键内容的最佳实践

Question

最好的查找结构是HashTable. 它平均提供恒定的访问（在最坏的情况下是线性的）。
这取决于哈希函数。好的。
我的问题如下。假设一个好的实现，HashTable例如HashMap是否有关于映射中传递的键的最佳实践？我的意思是建议键必须是不可变对象，但我想知道是否有其他建议。
例如密钥的大小？例如，在一个好的哈希映射中（以上述方式），如果我们用作String键，“瓶颈”不会出现在字符串比较中equals（试图找到键）吗？那么钥匙应该保持小吗？还是有不应该用作键的对象？例如URL? 在这种情况下，您如何选择使用什么作为密钥？

score 2 · Accepted Answer

HashMap 表现最好的键可能是 Integer，其中hashCode()和equals()实现为：

public int hashCode() {
    return value;
}

public boolean equals(Object obj) {
    if (obj instanceof Integer) {
        return value == ((Integer)obj).intValue();
    }
    return false;
}

也就是说，HashMap 的目的是将某些对象（值）映射到其他对象（键）。哈希函数用于寻址（值）对象的事实是提供快速、恒定时间的访问。

建议密钥必须是不可变对象，但我想知道是否还有其他建议。

建议是将对象映射到您需要的对象：不要认为更快；但想想什么是最适合您的业务逻辑来处理要检索的对象。

重要的要求是键对象必须是不可变的，因为如果您在将键对象存储到 Map 之后更改它，以后可能无法检索关联的值。

中的关键词是。你的对象应该只是map。如果你牺牲了优化键的映射任务，那么你就违背了 Map 的目的——可能没有实现任何性能提升。HashMapMap

我 100% 同意您问题中的前两条评论：

主要的限制是它必须是你想要查找的东西;)
– Oli Charlesworth

一般规则是使用您需要查找的任何内容作为键。
——路易斯·瓦瑟曼

记住优化的两条规则：

不。
（仅限专家）还没有。

第三条规则是：profile before to optimize。

score 1 · Accepted Answer

您应该使用您想使用的任何键来查找数据结构中的内容，它通常是特定于域的约束。话虽如此，请记住hashCode()和equals()都将用于在表中查找键。

hashCode()用于查找键的位置，whileequals()用于确定您要搜索的键是否真的是我们刚刚找到的键hashCode()。

例如，考虑使用单独的链接a在表中b具有相同哈希码的两个键。然后，一旦我们找到包含and的列表的索引，就需要测试表中是否可能同时存在and 。aa.equals(key)ababhashCode()

score 0 · Accepted Answer

建议密钥必须是不可变对象，但我想知道是否还有其他建议。

值的键应该是final。

大多数时候，对象的一个字段被用作键。如果该字段发生更改，则地图将找不到它：

void foo(Employee e) {
  map.put(e.getId(), e);
  String newId = e.getId() + "new";
  e.setId(newId);
  Employee e2 = e.get(newId);
  // e != e2 !
}

所以Employee根本不应该有一个setId()方法，但这很困难，因为当你写的时候Employee你不知道它会被什么键控。

score 0 · Accepted Answer

I digged up the implementation. I had an assumption that the effectiveness of the hashCode() method will be the key factor.

When I looked into the HashMap() and the Hashtable() implementation, I found that the implementation is quite similar (with one exception). Both are using and storing an internal hash code for all entries, so that's a good point that hashCode() is not so heavily influencing the performance.

Both are having a number of buckets, where the values are stored. It is important balance between the number of buckets (say n), and the average number of keys within a bucket (say k). The bucket is found in O(1) time, the content of the bucket is iterated in O(k) size, but the more bucket we have, the more memory will be allocated. Also, if many buckets are empty, it means that the hashCode() method for the key class does not the hashcode wide enough.

The algorithm works like this:

Take the `hashCode()` of the Key (and make a slight bijective transformation on it)
Find the appropriate bucket
Loop through the content of the bucket (which is some kind of LinkedList)
Make the comparison of the keys as follows:
1. Compare the hashcodes 
    (it is calculated in the first step, and stored for the entry)
2. Examine if key `==` the stored key (still no call) 
    (this step is missing from Hashtable)
3. Compare the keys by `key.equals(storedKey)`

To summarize:

hashCode() is called once per call (this is a must, you cannot do without it)
equals() is called if the hashCode is not so well spread, and two keys happen to have the same hashcode

The same algorithm is for get() and put() (because in put() case you can set the value for an existing key). So, the most important thing is how the hashCode() method was implemented. That is the most frequently called method.

Two strategies are: make it fast and make it effective (well-spread). The JDK developers made efforts to make it both, but it's not always possible to have them both.

Numeric types are good
Object (and non-overriden classes) are good (hashCode() is native), except that you cannot specify an own equals()
String is not good, iterates through the characters, but caches after that (see my comment below)
Any class with synchronized hashCode() is not good
Any class that has an iteration is not good
Classes that have hashcode cache are a bit better (depends on the usage)

Comment on the String: To make it fast, in the first versions of JDK the String hash code calculation was made for the first 32 characters only. But the hashcode it produced was not well spread, so they decided to take all the characters into the hashcode.

java - 哈希表中的关键内容的最佳实践

4 回答 4

Related

Reference