假设我们有 10 亿张独特的图像,每张 1 兆字节。我们计算每个文件内容的 SHA-256 哈希值。碰撞的可能性取决于:
- 文件数
单个文件的大小
假设它为零,我们可以忽略这种可能性到什么程度?
The usual answer goes thus: what is the probability that a rogue asteroid crashes on Earth within the next second, obliterating civilization-as-we-know-it, and killing off a few billion people? It can be argued that any unlucky event with a probability lower than that is not actually very important.
If we have a "perfect" hash function with output size n, and we have p messages to hash (individual message length is not important), then probability of collision is about p2/2n+1 (this is an approximation which is valid for "small" p, i.e. substantially smaller than 2n/2). For instance, with SHA-256 (n=256) and one billion messages (p=109) then the probability is about 4.3*10-60.
A mass-murderer space rock happens about once every 30 million years on average. This leads to a probability of such an event occurring in the next second to about 10-15. That's 45 orders of magnitude more probable than the SHA-256 collision. Briefly stated, if you find SHA-256 collisions scary then your priorities are wrong.
In a security setup, where an attacker gets to choose the messages which will be hashed, then the attacker may use substantially more than a billion messages; however, you will find that the attacker's success probability will still be vanishingly small. That's the whole point of using a hash function with a 256-bit output: so that risks of collision can be neglected.
Of course, all of the above assumes that SHA-256 is a "perfect" hash function, which is far from being proven. Still, SHA-256 seems quite robust.
冲突的可能性不取决于文件的大小,只取决于它们的数量。
这是生日悖论的一个例子。维基百科页面给出了碰撞可能性的估计。如果您计算这些数字,您会发现地球上生产的所有硬盘都无法容纳足够的 1MB 文件,因此 SHA-256 发生冲突的可能性甚至为 0.01%。
基本上,您可以简单地忽略这种可能性。
首先,它不是零,而是非常接近于零。
关键问题是如果真的发生碰撞会发生什么?如果答案是“核电站会爆炸”,那么您可能不应该忽略碰撞的可能性。在大多数情况下,后果并不那么可怕,因此您可以忽略碰撞的可能性。
也不要忘记您的软件(或其中的一小部分)可能会部署并同时用于大量计算机(包括当今几乎无处不在的一些微型嵌入式微型计算机)。在这种情况下,您需要将您得到的估计值乘以可能的最大副本数。