58

谁能告诉我为什么在 DJB 哈希函数中使用数字 5381?

DJB 哈希函数定义为:

  • h 0 = 5381

  • h i = 33h i - 1 + s i

这是一个C实现:

unsigned int DJBHash(char* str, unsigned int len)
{
   unsigned int hash = 5381;
   unsigned int i    = 0;

   for(i = 0; i < len; str++, i++)
   {   
      hash = ((hash << 5) + hash) + (*str);
   }   

   return hash;
}
4

3 回答 3

77

我偶然发现了一条评论,它揭示了 DJB 的所作所为:

/*
* DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
*
* This is Daniel J. Bernstein's popular `times 33' hash function as
* posted by him years ago on comp.lang.c. It basically uses a function
* like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best
* known hash functions for strings. Because it is both computed very
* fast and distributes very well.
*
* The magic of number 33, i.e. why it works better than many other
* constants, prime or not, has never been adequately explained by
* anyone. So I try an explanation: if one experimentally tests all
* multipliers between 1 and 256 (as RSE did now) one detects that even
* numbers are not useable at all. The remaining 128 odd numbers
* (except for the number 1) work more or less all equally well. They
* all distribute in an acceptable way and this way fill a hash table
* with an average percent of approx. 86%.
*
* If one compares the Chi^2 values of the variants, the number 33 not
* even has the best value. But the number 33 and a few other equally
* good numbers like 17, 31, 63, 127 and 129 have nevertheless a great
* advantage to the remaining numbers in the large set of possible
* multipliers: their multiply operation can be replaced by a faster
* operation based on just one shift plus either a single addition
* or subtraction operation. And because a hash function has to both
* distribute good _and_ has to be very fast to compute, those few
* numbers should be preferred and seems to be the reason why Daniel J.
* Bernstein also preferred it.
*
*
* -- Ralf S. Engelschall <rse@engelschall.com>
*/

这是一个与您正在查看的哈希函数略有不同的哈希函数,尽管它确实使用了 5381 幻数。链接目标处该注释下方的代码已展开。

然后我发现了这个

Magic Constant 5381:

  1. odd number

  2. prime number

  3. deficient number

  4. 001/010/100/000/101 b

也有这个答案有人可以解释 djb2 哈希函数背后的逻辑吗? 它引用了DJB 本人在邮件列表中提到 5381 的帖子(摘自该答案摘录在这里):

[...] 几乎任何好的乘数都有效。我认为您担心的是,如果 c 和 d 介于 0 到 255 之间,31c + d 不涵盖任何合理的哈希值范围。这就是为什么当我发现 33 哈希函数并开始在我的压缩器中使用它时,我从 5381 的哈希值开始。我想你会发现这和 261 乘数一样好。

于 2012-12-10T21:08:49.253 回答
36

5381 只是一个数字,在测试中,它导致更少的碰撞更好的雪崩。您会在几乎每个哈希算法中找到“神奇常数”。

于 2012-05-22T07:22:33.217 回答
31

我发现这个数字的一​​个非常有趣的属性可能是一个原因。

5381 是第 709 个素数。
709 是第 127 个素数。
127 是第 31 个素数。
31 是第 11 个素数。
11 是第 5 个素数。
5 是第三个素数。
3 是第二个素数。
2 是第一个素数。

5381 是第一个发生 8 次的数字。5381st prime 可能会超过signed int 的限制,因此最好停止链。

于 2017-01-25T11:07:13.153 回答