我有一个任务要完成。它说我必须读取一个包含 300 万个字符串的文件。
我必须阅读文件并构建一个结构来保存字符串。该系统必须能够回答“这个新字符串是否存在?”的问题。
我还希望将列表分解为字符串的“桶”,以便“匹配的字符串”能够(快速)选择正确的桶进行搜索,并且该桶应该包含不超过总/散列掩码字符串左右(即每个桶 3,000,000 / 0xFFF == 732 个对象)。
现在我已经创建了一个哈希表结构,列表和函数来读取文件,添加和删除函数。但我不知道以粗体输入的文本。我需要在哈希函数中实现一些东西(以粗体请求)吗?
下面是我的示例代码
#define MAX_NAME 100
/* Linked list structure */
typedef struct list
{
char *string;
int index;
struct list *next
} list_t ;
/* hash table structure*/
typedef struct hashTable
{
int size; // size of the table
list_t **table; // the table element
} hash_table_t;
HashListType *createHashTable( size_t size)
{
// allocate hash table ..I know how to do it
}
unsigned int hash(HashListType *hashTable, void *str )
{
uint64_t hashVal;
hashVal = 0;
while( *str != '\0')
{
hashVal = *str + (hashVal << 5 ) - hashVal;
str++;
}
return (hashVal % hashTable->size);
}
void addToHashList( HashListType *list, void *obj, uint64_t hash)
{
// add item of new list to table --> have an idea how to do it
}
void removeFromHashList(HashListType *list, void *criterion, uint64_t hash )
{
// got an idea how to do it
}
/*
this function will read the file (assume one string per line)
and create the list of lists (list of buckets), adding one object per string.
*/
HashList *loadDataSet(char *filename, int hashMask)
{
// to read a file
char readString[ MAX_NAME];
File *fp ;
if( (fp = fopen(filename, "r") )== NULL)
{
printf(" failed to open the file\n");
exit(0);
}
while( fgets ( readString,MAX_NAME -1, fp ) != NULL)
{
//need to break the list down into "buckets" of strings so the 'string to match'
// is able to chose the correct bucket to search in (quickly)
//and that bucket should contain no more than total/hashMask strings
or so (ie 3,000,000 / 0xFFF == 732 objects per bucket).
}
fclose(fp);
}