c++ - 在 C++ 中实现 Hashmap :: 用于模板化数据类型的散列函数

Question

我最近一直在使用 STL 的 unordered_map，虽然它似乎工作得很好，但鉴于数据类型作为模板参数给出，我不太了解哈希函数是如何工作的。为了更彻底地理解这个数据结构，我在 C++ 中实现了我自己的小 Hashmap 类：

哈希图接口：

#ifndef _HASHMAP_H_
#define _HASHMAP_H_

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iostream>
#include <vector.h>


//Beginning of Hashmap class definition

template <class Key, class Value>
class Hashmap{
private:

int mappedElementCount;



public:
explicit Hashmap();
virtual ~Hashmap();


virtual void test();

virtual int hash(Key*);

int* getSize();

void putKVPair(Key*,Value*);

void clearMap();


//When we use these methods, we'll want a linear vector of keys and values to 
    //iterate over, so vector is good here
std::vector<Key>* getKeys();
std::vector<Value>* getValues();

}; //end hashmap class definition
#endif /*_HASHMAP_H_*/

哈希图实现：

#include "Hashmap.h"

template<class Key,class Value> Hashmap<Key,Value>::Hashmap(){
mappedElementCount = 0;
}
template<class Key,class Value> Hashmap<Key,Value>::~Hashmap(){
printf("\nDestroying the base Hashmap object!\n");
}

template<class Key,class Value> void Hashmap<Key,Value>::test(){
printf("The size of our Key is %i and the size of our Value is
    %i\n",sizeof(Key),sizeof(Value));
}


template<class Key,class Value> int Hashmap<Key,Value>::hash(Key* k_ptr){

    unsigned int hashval;

    /* we start our hash out at 0 */
    hashval = 0;

        //TODO: How do we generate a hash signature when we don't know what data type 
        //we're going to be working with?

    return hashval % mappedElementCount;

}

template<class Key,class Value> std::vector<Key>* Hashmap<Key,Value>::getKeys(){
//TODO: prepare a vector initialized with all Key objects and return it here
return keys;    
}

template<class Key,class Value> std::vector<Value>* Hashmap<Key,Value>::getValues(){
//TODO: prepare a vector initialized with all Value objects and return it here
return values;  
}

template<class Key,class Value> int* Hashmap<Key,Value>::getSize(){
return &mappedElementCount;
}

template<class Key,class Value> void Hashmap<Key,Value>::putKVPair(Key* k, Value* v){
    //TODO: implement hashing of the key object k to determine
    //the address of the value object v

    //first step, generate a hash from our key
    int tempHash = hash(k);

       //TODO: store the Value at an address given by or influenced by tempHash

    //If all was successfully completed, increment the mapped records counter
    mappedElementCount++;
}



template<class Key,class Value> void Hashmap<Key,Value>::clearMap(){
//TODO: implement a cascading chain of deallocation of stored objects within the 
    //hashmap
//MAYBE-- only if we create new objects rather than just mapping reference 
    //associations,
//which is really the goal here...  In the latter case, just empty the Hashmap 
    //itself
}

解决此问题的一种可能的 OOP 方法是使用 Hashmap 作为基类，并提供具有已知 Key 数据类型的派生类，例如以下 Stringmap：

字符串映射接口：

#ifndef _STRINGMAP_H_
#define _STRINGMAP_H_

#include "Hashmap.h"

template <class Value>
class Stringmap:public Hashmap<std::string,Value>{
private:

public:
//Con/de 'structors
explicit Stringmap();
~Stringmap();

//Here we know our Key will be of type std::string
//so we can generate our hash sig by char values
    //Override hash from the base class
int hash(std::string*);

//override test from base class
void test();


};
#endif /*_STRINGMAP_H_ def*/

字符串映射实现：

#include "Stringmap.h"

template<class Value> Stringmap<Value>::Stringmap():Hashmap<std::string,Value>(){

}
template<class Value> Stringmap<Value>::~Stringmap(){
printf("\nDestroying the derived stringmap object!\n");
}

template<class Value> void Stringmap<Value>::test(){
printf("The size of our Value is %i\n",sizeof values[0]);
}

template<class Value> int Stringmap<Value>::hash(std::string* str_ptr){

    unsigned int hashval;

    /* we start our hash out at 0 */
    hashval = 0;


    /* for each character, we multiply the old hash by 31 and add the current
     * character.  Remember that shifting a number left is equivalent to
     * multiplying it by 2 raised to the number of places shifted.  So we
     * are in effect multiplying hashval by 32 and then subtracting hashval.
     * Why do we do this?  Because shifting and subtraction are much more
     * efficient operations than multiplication.
     */
    for(int i=0;i<str_ptr->length();i++) {
        hashval = (*(str_ptr))[i] + ((hashval << 5) - hashval);
    }

    /* we then return the hash value mod the hashmap size so that it will
     * fit into the necessary range
     */
    return hashval % (*(Hashmap<std::string,Value>::getSize()));

}

那么问题来了：当要散列的数据类型目前未知时，是否可以创建散列签名？如果是这样，怎么做？查看 std::hash 文档，似乎 C++ 标准只是为每个原始数据类型以及 T* （对于任何类型 T）定义了一个哈希函数......缺少的是如何为给定的实现这种哈希原始数据类型，更重要的是，它是如何为通用 T* 实现的。我想我可以只调用 hash(Key) 并希望最好，但很高兴了解幕后发生的事情。

谢谢，CCJ

score 5 · Accepted Answer

std::unorderd_map接受 2 个显式模板参数（Key和Value），并且还有一堆隐藏的模板参数，其中 Hash 函数默认为std::hash<Key>.

此 STL 哈希函数std::hash<Key>接受 aKey并返回 a std::size_t。它已经专门用于所有整数类型和std::string. 从这个参考网站

散列模板定义了一个实现散列函数的函数对象。此函数对象的实例定义了一个 operator()，它：

接受一个 Key 类型的参数。

返回一个 size_t 类型的值，它表示参数的哈希值。

调用时不抛出异常。

对于两个相等的参数 k1 和 k2，std::hash()(k1) == std::hash()(k2)。

对于不相等的两个不同参数k1和k2，std::hash()(k1) == std::hash()(k2)的概率应该很小，接近1.0/std::numeric_limits::max ()。

哈希模板既是 CopyConstructible 又是 Destructible 的。无序关联容器 std::unordered_set、std::unordered_multiset、std::unordered_map、std::unordered_multimap 使用模板 std::hash 的特化作为默认散列函数。

参考以这句话结尾：

** 实际的散列函数是依赖于实现的，除了上面指定的那些之外，不需要满足任何其他质量标准。 **

因此，您可以查看系统的实现，但这并不能保证其他系统的实现。

score 3 · Accepted Answer

有一个std::hash<T>专门针对各种类型的模板，您可以专门针对自己的类型。

默认情况下，std::unordered_map<T>只需将散列委托给std::hash<T>（或者您可以指定不同的散列函数作为模板参数）。

因此std::unordered_map，根本不需要了解散列机制。

至于如何std::hash实现，没有具体说明。但是，我认为可以合理地假设任何体面的编译器都会提供高质量的实现。要记住的一个问题是，std::hash<char*>它不散列 C 字符串，它只散列指针的值（在那里:)）

c++ - 在 C++ 中实现 Hashmap :: 用于模板化数据类型的散列函数

2 回答 2

Related

Reference