arrays - 你将如何用语言 x 实现哈希表？

Question

这个问题的重点是收集使用不同语言的数组实现哈希表的示例列表。如果有人可以对它们的工作方式以及每个示例发生的情况进行非常详细的概述，那也很好。

编辑：

为什么不直接使用特定语言的内置哈希函数？

因为我们应该知道哈希表是如何工作的并且能够实现它们。这似乎不是一个超级重要的话题，但了解最常用的数据结构之一如何工作对我来说似乎非常重要。如果这要成为编程的维基百科，那么这些是我将来到这里的一些类型的问题。我不是在寻找要在这里写的 CS 书。我可以将 Intro to Algorithms 下架并阅读有关哈希表的章节并获取此类信息。更具体地说，我正在寻找的是代码示例。不仅对我尤其如此，而且对于可能有一天会搜索类似信息并偶然发现此页面的其他人也是如此。

更具体地说：如果你必须实现它们，并且不能使用内置函数，你会怎么做？

你不需要把代码放在这里。将其放入 pastebin 并链接它。

score 20 · Accepted Answer

哈希表是一种允许在恒定时间内查找项目的数据结构。它通过散列一个值并将该值转换为数组中的偏移量来工作。哈希表的概念相当容易理解，但实现起来显然更难。我不会在这里粘贴整个哈希表，但这里有一些我几周前用 C 语言制作的哈希表的片段......

创建哈希表的基础之一是具有良好的哈希函数。我在哈希表中使用了 djb2 哈希函数：

int ComputeHash(char* key)
{
    int hash = 5381;
    while (*key)
    hash = ((hash << 5) + hash) + *(key++);
    return hash % hashTable.totalBuckets;
}

然后是用于创建和管理表中存储桶的实际代码本身

typedef struct HashTable{
    HashTable* nextEntry;
    char*      key;
    char*      value;
}HashBucket;

typedef struct HashTableEntry{
    int totalBuckets;       // Total number of buckets allocated for the hash table
    HashTable** hashBucketArray; // Pointer to array of buckets
}HashTableEntry;
HashTableEntry hashTable;

bool InitHashTable(int totalBuckets)
{
    if(totalBuckets > 0)
    {
        hashTable.totalBuckets = totalBuckets;
        hashTable.hashBucketArray = (HashTable**)malloc(totalBuckets * sizeof(HashTable));
        if(hashTable.hashBucketArray != NULL)
        {
            memset(hashTable.hashBucketArray, 0, sizeof(HashTable) * totalBuckets);
            return true;
        }
    }
    return false;
}

bool AddNode(char* key, char* value)
{
    int offset = ComputeHash(key);
    if(hashTable.hashBucketArray[offset] == NULL)
    {
        hashTable.hashBucketArray[offset] = NewNode(key, value);
        if(hashTable.hashBucketArray[offset] != NULL)
            return true;
    }

    else
    {
        if(AppendLinkedNode(hashTable.hashBucketArray[offset], key, value) != NULL)
            return true;
    }
    return false;
}

HashTable* NewNode(char* key, char* value)
{
    HashTable* tmpNode = (HashTable*)malloc(sizeof(HashTable));
    if(tmpNode != NULL)
    {
        tmpNode->nextEntry = NULL;
        tmpNode->key   = (char*)malloc(strlen(key));
        tmpNode->value = (char*)malloc(strlen(value));

        strcpy(tmpNode->key, key);
        strcpy(tmpNode->value, value);
    }
    return tmpNode;
}

AppendLinkedNode 找到链表中的最后一个节点并将一个新节点附加到它。

该代码将像这样使用：

if(InitHashTable(100) == false) return -1;
AddNode("10", "TEN");

查找节点很简单：

HashTable* FindNode(char* key)
{
    int offset = ComputeHash(key);
    HashTable* tmpNode = hashTable.hashBucketArray[offset];
    while(tmpNode != NULL)
    {
        if(strcmp(tmpNode->key, key) == 0)
            return tmpNode;
        tmpNode = tmpNode->nextEntry;
    }
    return NULL;
}

并使用如下：

char* value = FindNode("10");

score 8 · Accepted Answer

< 60 LoC 中的 Java 实现

import java.util.ArrayList;
import java.util.List;
import java.util.Random;


public class HashTable {

    class KeyValuePair {

        Object key;

        Object value;

        public KeyValuePair(Object key, Object value) {
            this.key = key;
            this.value = value;
        }
    }

    private Object[] values;

    private int capacity;

    public HashTable(int capacity) {
        values = new Object[capacity];
        this.capacity = capacity;
    }

    private int hash(Object key) {
        return Math.abs(key.hashCode()) % capacity;
    }

    public void add(Object key, Object value) throws IllegalArgumentException {

        if (key == null || value == null)
            throw new IllegalArgumentException("key or value is null");

        int index = hash(key);

        List<KeyValuePair> list;
        if (values[index] == null) {
            list = new ArrayList<KeyValuePair>();
            values[index] = list;

        } else {
            // collision
            list = (List<KeyValuePair>) values[index];
        }

        list.add(new KeyValuePair(key, value));
    }

    public Object get(Object key) {
        List<KeyValuePair> list = (List<KeyValuePair>) values[hash(key)];
        if (list == null) {
            return null;
        }
        for (KeyValuePair kvp : list) {
            if (kvp.key.equals(key)) {
                return kvp.value;
            }
        }
        return null;
    }

    /**
     * Test
     */
    public static void main(String[] args) {

        HashTable ht = new HashTable(100);

        for (int i = 1; i <= 1000; i++) {
            ht.add("key" + i, "value" + i);
        }

        Random random = new Random();
        for (int i = 1; i <= 10; i++) {
            String key = "key" + random.nextInt(1000);
            System.out.println("ht.get(\"" + key + "\") = " + ht.get(key));
        }   
    }
}

score 7 · Accepted Answer

HashTable 是一个非常简单的概念：它是一个数组，其中键和值对通过以下方案放入其中（如关联数组）：

散列函数将（希望）未使用的索引的键散列到数组中。然后将该值放入该特定索引处的数组中。

数据检索很容易，因为可以通过哈希函数计算数组的索引，因此查找时间约为 O(1)。

当哈希函数将 2 个不同的键映射到同一个索引时，就会出现问题……有很多方法可以处理这个问题，我不会在这里详述……

哈希表是一种快速存储和检索数据的基本方式，并且几乎“内置”在所有编程语言库中。

score 3 · Accepted Answer

我一直在寻找一个完全可移植的 C 哈希表实现，并开始对如何自己实现一个感兴趣。经过一番搜索后，我发现： Julienne Walker 的 The Art of Hashing有一些关于散列和散列表的很棒的教程。实现它们比我想象的要复杂一些，但这是一本很棒的书。

score 1 · Accepted Answer

Here is my code for a Hash Table, implemented in Java. Suffers from a minor glitch- the key and value fields are not the same. Might edit that in the future.

public class HashTable
{
    private LinkedList[] hashArr=new LinkedList[128];
    public static int HFunc(int key)
    {
        return key%128;
    }


    public boolean Put(Val V)
    {

        int hashval = HFunc(V.getKey());
        LinkedNode ln = new LinkedNode(V,null);
        hashArr[hashval].Insert(ln);
        System.out.println("Inserted!");
        return true;            
    }

    public boolean Find(Val V)
    {
        int hashval = HFunc(V.getKey());
        if (hashArr[hashval].getInitial()!=null && hashArr[hashval].search(V)==true)
        {
            System.out.println("Found!!");
            return true;
        }
        else
        {
            System.out.println("Not Found!!");
            return false;
        }

    }
    public boolean delete(Val v)
    {
        int hashval = HFunc(v.getKey());
        if (hashArr[hashval].getInitial()!=null && hashArr[hashval].delete(v)==true)
        {
            System.out.println("Deleted!!");
            return true;
        }
        else 
        {
            System.out.println("Could not be found. How can it be deleted?");
            return false;
        }
    }

    public HashTable()
    {
        for(int i=0; i<hashArr.length;i++)
            hashArr[i]=new LinkedList();
    }

}

class Val
{
    private int key;
    private int val;
    public int getKey()
    {
        return key;
    }
    public void setKey(int k)
    {
        this.key=k;
    }
    public int getVal()
    {
        return val;
    }
    public void setVal(int v)
    {
        this.val=v;
    }
    public Val(int key,int value)
    {
        this.key=key;
        this.val=value;
    }
    public boolean equals(Val v1)
    {
        if (v1.getVal()==this.val)
        {
            //System.out.println("hello im here");
            return true;
        }
        else 
            return false;
    }
}

class LinkedNode
{
    private LinkedNode next;
    private Val obj;
    public LinkedNode(Val v,LinkedNode next)
    {
        this.obj=v;
        this.next=next;
    }
    public LinkedNode()
    {
        this.obj=null;
        this.next=null;
    }
    public Val getObj()
    {
        return this.obj;    
    }
    public void setNext(LinkedNode ln)
    {
        this.next = ln;
    }

    public LinkedNode getNext()
    {
        return this.next;
    }
    public boolean equals(LinkedNode ln1, LinkedNode ln2)
    {
        if (ln1.getObj().equals(ln2.getObj()))
        {
            return true;
        }
        else 
            return false;

    }

}

class LinkedList
{
    private LinkedNode initial;
    public LinkedList()
    {
        this.initial=null;
    }
    public LinkedList(LinkedNode initial)
    {
        this.initial = initial;
    }
    public LinkedNode getInitial()
    {
        return this.initial;
    }
    public void Insert(LinkedNode ln)
    {
        LinkedNode temp = this.initial;
        this.initial = ln;
        ln.setNext(temp);
    }
    public boolean search(Val v)
    {
        if (this.initial==null)
            return false;
        else 
        {
            LinkedNode temp = this.initial;
            while (temp!=null)
            {
                //System.out.println("encountered one!");
                if (temp.getObj().equals(v))
                    return true;
                else 
                    temp=temp.getNext();
            }
            return false;
        }

    }
    public boolean delete(Val v)
    {
        if (this.initial==null)
            return false;
        else 
        {
            LinkedNode prev = this.initial;
            if (prev.getObj().equals(v))
            {
                this.initial = null;
                return true;
            }
            else
            {
                LinkedNode temp = this.initial.getNext();
            while (temp!=null)
            {
                if (temp.getObj().equals(v))
                {
                    prev.setNext(temp.getNext());
                    return true;
                }
                else
                {
                    prev=temp;
                    temp=temp.getNext();
                }
            }
            return false;
            }
        }
    }
}

score 0 · Accepted Answer

我认为你需要更具体一点。关于以下选项，哈希表有几种变体

哈希表是固定大小的还是动态的？
使用什么类型的哈希函数？
调整哈希表大小时是否有任何性能限制？

该列表可以继续下去。这些约束中的每一个都可能导致任何语言的多个实现。

就个人而言，我只会使用以我选择的语言提供的内置哈希表。我什至会考虑实现我自己的唯一原因是由于性能问题，即使这样也很难击败大多数现有的实现。

score 0 · Accepted Answer

对于可能使用上述代码的人，请注意内存泄漏：

tmpNode->key   = (char*)malloc(strlen(key)+1);   //must have +1 for '\0'
tmpNode->value = (char*)malloc(strlen(value)+1); //must have +1
strcpy(tmpNode->key, key);
strcpy(tmpNode->value, value);

并完成代码：

HashNode* AppendLinkedNode(HashNode* ptr, char* key, char* value)
{
    //follow pointer till end
    while (ptr->nextEntry!=NULL)
    {
        if (strcmp(ptr->value,value)==0) return NULL;
        ptr=ptr->nextEntry;
    }
    ptr->nextEntry=newNode(key,value);
    return ptr->nextEntry;
}

score 0 · Accepted Answer

F# 中作为函数的最小实现，它从给定的键值对序列构建哈希表并返回一个函数，该函数在哈希表中搜索与给定键关联的值：

> let hashtbl xs =
    let spine = [|for _ in xs -> ResizeArray()|]
    let f k = spine.[abs(k.GetHashCode()) % spine.Length]
    Seq.iter (fun (k, v) -> (f k).Add (k, v)) xs
    fun k -> Seq.pick (fun (k', v) -> if k=k' then Some v else None) (f k);;
val hashtbl : seq<'a * 'b> -> ('a -> 'b) when 'a : equality

score -1 · Accepted Answer

我去阅读了一些关于散列的维基百科页面：http ://en.wikipedia.org/wiki/Hash_table 。在这里为哈希表编写代码似乎需要做很多工作，特别是因为我使用的大多数语言都已经内置了它们。为什么要在这里实现？这些东西确实属于语言库。

请详细说明您预期的解决方案应包括哪些内容：

哈希函数
可变桶数
碰撞行为

还要说明在这里收集它们的目的是什么。任何严肃的实现都会很容易让人满嘴=这将导致很长的答案（每个可能有几页长）。您可能还引诱人们从库中复制代码...

arrays - 你将如何用语言 x 实现哈希表？

9 回答 9

Related

Reference