create a histogram:
unsigned histogram[tablesize] = {0};
generate some (all) possible strings and compute their hashval, and update the histogram accordingly:
for(iter=0; iter < somevalue; iter++) {
hashval = hashfunc( string.iterate(iter) ); // I don't know c++
histogram[hashval] +=1;
}
Now you have to analyze the hashtable for lumps / clusters. Rule of thumb is that for (tablesize==iter)
, you expect about 30 % cells with count =1, and about 30 % empty; the rest has two or more.
If you sum all the (count*(count+1))/2
, and divide by the tablesize, you should expect around 1.5. A bad hashfunction gives higher values, a perfect hash would only have cells with count=1 (and thus: ratio=1) With linear probing you should of course never use tablesize=niter, but make tablesize bigger, say twice as big. You can use the same metric (number of probes / number of entries), to analyse its performance, though.
UPDATE: a great introduction on hashfunctions and their performance can be found at http://www.strchr.com/hash_functions .