2

正如问题所述,如何计算要使用的最佳数字以及如何激励它?

如果我们要构建一个使用以下哈希函数的哈希表:

h(k) = k mod m, k = 键

所以一些消息来源告诉我:

  1. 使用要插入的元素数作为 m 的值
  2. 使用与 m 接近的素数
  3. java 只是使用 31 作为他们的 m 值
  4. 有些人告诉我使用 2^n 的闭素数作为 m

在这一点上我很困惑,以至于我不知道要为 m 使用什么值。例如,如果我们使用 m 的表大小,那么如果我们想扩大表大小会发生什么?然后我是否必须用 m 的新值重新散列所有值。如果是这样,为什么 Java 只使用 31 作为 m 的质数。

我还听说表大小应该是哈希表中总元素的两倍,这是每次重新散列时的大小。但是,我们为什么要使用 m=10 来表示一个包含 10 个元素的表,而它应该是 m=20 来创建额外的空白空间呢?

有人可以帮我理解如何根据不同的场景计算 m 的值,比如当我们想要一个静态(我们知道我们只会插入 10 个元素)或动态(在一定限制后重新散列)哈希表时.

让我们通过以下示例说明我的问题:

我得到了值 {1,2,...,n}

问题:如果我必须在我的哈希函数中使用除以 mod,那么 m 的优化值是多少?

情景 1:n = 100?

情景 2:n = 5043?

附加问题: 如果我们使用开放或封闭的哈希表,m值哈希函数会有所不同吗?

请注意,我现在不需要了解 java 的哈希表,但一般来说,我必须使用除法 mod 哈希函数的哈希表。

感谢您的时间!

4

1 回答 1

0

You have several issues here: 1) What should m equal? 2) How much free space should you have in your hash table? 3) Should you make the size of your table be a prime number?

1) As was mentioned in the comments, the h(k) you describe isn't the hash function, it gives you the index into your hash table. The idea is that every object produces some hash code, which is a positive integer. You use the hash code to figure out where to put the object in the hash table (so that you can find it again later). You clearly don't want a hash table of size MAX_INT, so you choose some size m. Then for any object, you take its hash code, compute k % m, and now you have an integer in the interval [0, m-1], which is a valid index into your hash table.

2) Because a hash table works by using a hash code to find the place in a table where an object should go, you get into trouble if multiple items are assigned to the same location. This is called a collision. Every hash table implementation must deal with collisions, either by putting items into nearby spots or keeping a linked list of items in each location. No matter the solution, more collisions means lower performance for your hash table. For that reason, it is recommended that you not let your hash table fill up, otherwise, collisions are more likely. Keeping your hash table at least twice as large as the number of items is a common recommendation to reduce the probability of collisions. Obviously, this means you will have to resize your table as it fills up. Yes, this means that you have to rehash each item since it will go into a different location when you are taking a modulus by a different value. That is the hidden cost of a hash table: it runs in constant time (assuming few or no collisions), but it can have a large coefficient (ammortized resizing, rehashing, etc.).

3) It is also often recommended that you make the size of your hash table be a prime number. This is because it tends to produce a better distribution of items in your hash table in certain common use cases, thus avoiding collisions. Rather than giving a complete explanation here, I will refer you to this excellent answer: Why should hash functions use a prime number modulus?

于 2013-09-13T15:22:49.730 回答