4

我的导师把这个扔给了我们,并告诉我们我们只需要在谷歌上搜索如何编写哈希函数。我在这方面很没有方向。我们为类编写了一个基本的哈希表模板,但我有一个项目,需要将大约 160,000 个字符串分类到一个至少有 500 个桶的表中(我想做更多的速度)。

我只是不知道在哪里可以找到简明易懂的信息。

任何帮助将不胜感激。

4

1 回答 1

5

I suggest a universal hash function. This kind of function guarantees a small number of collisions in expectation, even if the data is chosen by an adversary. There are plenty of universal hash functions.

In case of strings, you can adopt the following hash function.

For a character c we define #(c) the arithmetic value of c ie(ASCII). For a string x=c1c1...cn we define enter image description hereenter image description here

If HSize is an integer and k a big prime number (you define it), for a range 0<a,b<k*HSizelet the hash function be:

enter image description here

This function provides output between [0, HSize-1]

The output value is calculated by horner's rule, finding the modulo of the k*HSize division after every operation.

So, create a function HashFunction and pass the string you want to hash as a parameter. Here is the code:

#define k 7919 
#define Hsize 1009   
#define a 321
#define b 43112

long long HashFunction(string text)
{
  int i;
  long long  res = 0;
  long long M = (Hsize * k);
  cout<<"M = "<<M<<endl;
  cout<<"Hsize = "<<Hsize<<endl;
  cout<<"k = "<<k<<endl;
  int s=text.size();
  for(i = s-1; i >= 0; i--)
  {
    res = a * (res * 256 + (int)text[i]);
    //cout<<"res before modulo = "<<res<<endl;
    res=res % M;
    //cout<<"res after modulo = "<<res<<endl;
  }
    long long res1 = (res + b) / k;
    return res1;
}
于 2013-11-09T15:05:13.600 回答