0

这是一个前缀散列函数。我想计算这种方法中的碰撞次数,但我不知道该怎么做。看起来它可能很简单,但我就是想不出一个很好的方法来做到这一点......

int HashTable_qp::preHash(string & key, int tableSize )
{
    string pad = "AA";
    //some words in the input are less than 3 letters
    //I choose to pad the string with A because all padded characters 
    //have same ascii val, which is low, and will hopefully alter the results less
    if (key.length() < 3)
    {
        key.append(pad);
    }
    return ( key[0] + 27 * key[1] + 729 * key[2] ) % tableSize;
}
4

3 回答 3

1

如果它是一个数组作为底层数据结构: int hash = preHash(&key, array.length); if(array[hash] != null) this.count++; 如果它是一个链表数组,那么:

if(array[hash] != null && *(array[hash]) != null)
this.count++

如果您只能访问 stl 库,我相信在调用散列函数之后在添加之前测试该元素是否为空就足够了。

于 2011-12-05T23:53:38.227 回答
1

create a histogram:

 unsigned histogram[tablesize] = {0};

generate some (all) possible strings and compute their hashval, and update the histogram accordingly:

for(iter=0; iter < somevalue; iter++) {
 hashval = hashfunc( string.iterate(iter) ); // I don't know c++
 histogram[hashval] +=1;
 }

Now you have to analyze the hashtable for lumps / clusters. Rule of thumb is that for (tablesize==iter), you expect about 30 % cells with count =1, and about 30 % empty; the rest has two or more.

If you sum all the (count*(count+1))/2, and divide by the tablesize, you should expect around 1.5. A bad hashfunction gives higher values, a perfect hash would only have cells with count=1 (and thus: ratio=1) With linear probing you should of course never use tablesize=niter, but make tablesize bigger, say twice as big. You can use the same metric (number of probes / number of entries), to analyse its performance, though.

UPDATE: a great introduction on hashfunctions and their performance can be found at http://www.strchr.com/hash_functions .

于 2011-12-06T00:17:18.593 回答
0

您可以创建一个整数数组,每个整数代表一个哈希值。当您完成使哈希循环返回嵌套循环中的数组时。如果您有以下数组,

[0] -> 13
[1] -> 5
[2] -> 12
[3] -> 7
[4] -> 5

对于0..n中的每个项目i,检查项目i+1..n是否匹配。在英语中,这将是:检查每个元素是否等于它之后的任何元素。

于 2011-12-05T23:49:00.100 回答