1

使用我的 web 应用程序,我将具有哈希生成文件名的缓存文件存储在各种子目录中,以优化性能水平。我知道我可以提高性能的一种方法是确保生成的名称遵循 8.3 文件名结构,这样 NTFS 就不必生成短文件名(我无法在注册表中设置它)。

为了做到这一点,尽管我必须将哈希(我在想 SHA1)修剪为 8 个字符,但显然这会大大增加冲突的可能性。我想知道碰撞的概率是多少?

我在这里看到了关于完整 SHA1 哈希冲突率的答案,但我的数学很糟糕,所以计算这个值远远超出了我的范围。

4

2 回答 2

5

Since SHA-1's output is uniformly distributed, you can approximate the collision rate using the Birthday Paradox:

Assume you keep n bits of the SHA-1 output, there is a ~50% chance that you would have a collision in a set containing 2^(n/2) records, or in other words your collision rate is approximately 1/2^(n/2)

If you need a more accurate answer, you can always use the formula in the answer you've referenced in your question.

So here, if we assume each character is 1 Byte (8 bits), then you will most likely encounter a collision if you have ~2^(8*8/2) = 4294967296 records (therefore the collision rate is going to be 2.32 * 10^-8 which is very small).

Considering the collision rate you have discovered using your test program, the ToSHA1Fingerprint() function returns a Hexadecimal string which means an 8 character sub-string of it only represents 4 bytes and hence the approximate collision rate based on the above formula is 1/2^(4*8/2) = 0.000015258789 or 0.002%.

于 2014-03-03T20:08:55.673 回答
0

看起来碰撞率对我的需求来说太高了,我正在使用以下代码进行 ~0.004% 的测试。

const int Iterations = 10;
const int Maxitems = 360000;

for (int i = 0; i < Iterations; i++)
{
    List<string> paths = new List<string>();

    for (int j = 0; j < Maxitems; j++)
    {
        string path = Path.GetRandomFileName().ToSHA1Fingerprint()
                                              .Substring(0, 8);

        paths.Add(path);
    }

    int count = paths.Distinct().Count();

    double collisionRate = ((Maxitems - count) * 100D) / Maxitems;
    collisions.Add(collisionRate);
}

double averageCollisionRate = collisions.Average();
于 2013-03-18T22:12:57.970 回答