hash - 在实践中忽略 SHA 冲突的可能性是否安全？

Question

假设我们有 10 亿张独特的图像，每张 1 兆字节。我们计算每个文件内容的 SHA-256 哈希值。碰撞的可能性取决于：

文件数
~~单个文件的大小~~

假设它为零，我们可以忽略这种可能性到什么程度？

score 451 · Accepted Answer

The usual answer goes thus: what is the probability that a rogue asteroid crashes on Earth within the next second, obliterating civilization-as-we-know-it, and killing off a few billion people? It can be argued that any unlucky event with a probability lower than that is not actually very important.

If we have a "perfect" hash function with output size n, and we have p messages to hash (individual message length is not important), then probability of collision is about p²/2ⁿ⁺¹ (this is an approximation which is valid for "small" p, i.e. substantially smaller than 2^n/2). For instance, with SHA-256 (n=256) and one billion messages (p=10⁹) then the probability is about 4.3*10^-60.

A mass-murderer space rock happens about once every 30 million years on average. This leads to a probability of such an event occurring in the next second to about 10^-15. That's 45 orders of magnitude more probable than the SHA-256 collision. Briefly stated, if you find SHA-256 collisions scary then your priorities are wrong.

In a security setup, where an attacker gets to choose the messages which will be hashed, then the attacker may use substantially more than a billion messages; however, you will find that the attacker's success probability will still be vanishingly small. That's the whole point of using a hash function with a 256-bit output: so that risks of collision can be neglected.

Of course, all of the above assumes that SHA-256 is a "perfect" hash function, which is far from being proven. Still, SHA-256 seems quite robust.

score 54 · Accepted Answer

冲突的可能性不取决于文件的大小，只取决于它们的数量。

这是生日悖论的一个例子。维基百科页面给出了碰撞可能性的估计。如果您计算这些数字，您会发现地球上生产的所有硬盘都无法容纳足够的 1MB 文件，因此 SHA-256 发生冲突的可能性甚至为 0.01%。

基本上，您可以简单地忽略这种可能性。

score 21 · Accepted Answer

首先，它不是零，而是非常接近于零。

关键问题是如果真的发生碰撞会发生什么？如果答案是“核电站会爆炸”，那么您可能不应该忽略碰撞的可能性。在大多数情况下，后果并不那么可怕，因此您可以忽略碰撞的可能性。

也不要忘记您的软件（或其中的一小部分）可能会部署并同时用于大量计算机（包括当今几乎无处不在的一些微型嵌入式微型计算机）。在这种情况下，您需要将您得到的估计值乘以可能的最大副本数。

hash - 在实践中忽略 SHA 冲突的可能性是否安全？

3 回答 3

Related

Reference