2

当我们只在哈希表中存储键的结果数据时,如何执行哈希表调整大小?我能想到的一个例子是存储用户名和密码组合。为了隐私起见,我们只存储密码(也可以有许多其他用例)。在这里,表只存储数据但不存储键。鉴于此,如果我想在调整大小期间将条目从旧表复制到新表,我没有散列的密钥。

这里如何调整大小?

4

4 回答 4

1

在这种情况下,您根本没有真正使用哈希表——或者至少您所说的哈希没有被用作表的哈希。

换句话说,密码的散列(例如,SHA-256)只是另一条数据,存储在表(散列表或其他)中。仅当/当密码更改时才会修改。

它可能存储一个以用户名为键的哈希表中——但如果是这样,您将在此处拥有用户名,以便您可以在需要时重新哈希。

如果出于某种原因,您决定使用密码的安全散列作为表中的键,则完整的散列将键,而您用于索引表的将是该散列的某个散列(例如,两个字节的块全部异或在一起)。

Edit: As far as table resizing (by itself) goes, no, storing the key is not an absolute necessity. You can get by with just storing the remainder of the original hash code. In other words, you'll typically start by generating, say, a 32-bit hash. You then use part (but only part) of that to index into your table (e.g., 16 bits). When it comes time to resize the table, you will take the position in the table for 16 bits, and the stored remainder of the hash for the other 16 bits to recover the original 32-bit hash. Assuming you're doubling the size of your table, you'll then use 17 bits for the index, and store the remaining 15 bits in the table.

If you didn't mind going a bit insane with the hash itself, you could also use this to eliminate any real need to store the key at all. For example, if you started by creating a 256-bit hash (e.g., SHA-256), you could use N bits as the index into the hash table, and the remaining bits as if it were the key. If your actual keys were longer than 256 bits, this would lead to the possibility of collisions -- but collisions with SHA-256 are sufficiently rare that the odds of encountering one by chance are almost certainly lower than the chances of a computer error during a key comparison, so it said two keys were identical when they really weren't.

于 2012-12-08T23:40:38.223 回答
1

Short version: Hashtables can't work with only the hash codes of their data; they must contain actual data. If all you provide to the hash table is a hash, then it's the data. A hashtable of hashes, which is what you seem to be trying to create, would not have problems with resizing. As far as the hashtable is concerned, the hash you inserted is the data, and whatever was hashed to produce the data is irrelevant.


Long version:

A hashtable and a table of hashes are two orthogonal ideas. It sounds like you're conflating the two in a way that doesn't make much sense in either context.

For a hashtable, the entire point is to map keys unambiguously to values. Without both, the table is useless. The hash code that a hashtable uses is not, and can not be, logically separated from the data -- it comes from the data, and is often calculated on the fly. The sole purpose of the hash code is to quickly locate (or store) the actual key->value mapping; you still need an actual key and an actual value to map.

The big reason for needing the actual key is, the hash code only goes so far. To some degree, every useful hashtable and every hash function is inherently bound by the pigeonhole principle. This means

  • Every hash function by definition can return equal hash codes for two different values; and
  • Every useful hashtable has to resolve collisions somehow (between equal hash codes and/or between different hash codes that point at the same bucket).

Hashtables are actually doubly affected by the pigeonhole principle; they not only have to reduce the data to a (typically int-sized) hash code, but since most hash tables can't reasonably have 4 billion buckets, the code then has to do some further reduction with that hash code to turn it into a bucket number. (A common example is taking the hash code times some prime number, mod the number of buckets.)

Even read-only hashtables typically have to resolve collisions. Imagine a read-only hash table that uses a hash function so perfect that every value in the table ends up in its own bucket. Even in this case, what happens when you try to find a key that's not in the table, but produces a hash code that resolves to a bucket containing a key that is in the table? Either you need the value itself to double-check against, or the table lies and says the key exists, even if the hash code is not actually equal!

Basically the table uses the hash code of the lookup key (or, more typically, the return value from some function on that hash code) to figure out what bucket to look in, and then compares each actual key in that bucket with the actual lookup key in order to disambiguate. This wouldn't work without an actual key of some sort, unless:

  • your hash function was guaranteed to produce a unique value for every input (making it no longer a hash function, but an encoding or encryption function), and
  • you had an infinite number of buckets.

The only way you could have hashes in a hash table without the original data, is if the hashes are the data. In that case you have a hashtable of hashes. In the case you're mentioning, you might use a hash of the username as a key, mapping to a SHA /mcrypt/bcrypt/whatever hash of the password. In that case, the hash is the key, and no one cares about the username anymore. This would not cause problems with resizing or whatever, as the hash function the hashtable uses has very little to do with the hash function you used; as far as the hashtable cares, the hash you gave it is just another value, and has its own hash code, and that hash code is what the hashtable uses internally to find stuff.

I might warn you against using hashes for usernames, though. I'd suggest that at least one piece of the auth data be collision-proof, and hashes are by their very nature not collision-proof. And i'll personally tear up your programming license if i catch you running around storing unhashed passwords in a potentially user-visible location. :) Perhaps if you encrypted the username instead of hashing it? That'd effectively guarantee its uniqueness, and if you use public-key encryption and "forget" the private key, it's effectively as irreversible as hashing. Nifty side effect is, public-key encryption is generally kinda slow due to how it works. You're basically taking a 1024-bit or larger number and multiplying it by itself thousands of times. So you have some built-in protection against brute force.

于 2012-12-09T00:51:49.893 回答
0

您也可以通过存储原始密钥来做到这一点。对于密码,您不会将它们存储在哈希表中。

于 2012-12-08T23:38:56.173 回答
0

我想这并不能回答您的问题,但我想知道哈希表如何完全不知道其条目的键。当然,除非您确定永远不会发生碰撞。在给定的键是用户名的示例中,我看不出这怎么可能(除非哈希表一开始就大得离谱)。

无论哪种方式,如果我们谈论的是实际的哈希表,而不是直接访问表,我认为在不知道密钥的情况下在不同大小的表之间复制条目在逻辑上是不可能的,因为通常哈希表的大小很重要哈希函数的一部分。

于 2012-12-08T23:40:31.583 回答