c++ - C++ Two distinct strings with the same 40 bit hash value

Question

I am having some trouble with an assignment, I am to create an algorithm for a given function that takes a string and turns it into a 40 bit hash. With this, I must find two distinct hashes that have the same value. The TA gave us a hint about using the birthday paradox to find the amount of distinct strings before I get a reasonable probability. My question is,how should I approach this given no strings and no set length they must be.

score 3 · Accepted Answer

考虑到有关“生日悖论”的提示（这根本不是悖论），我假设您的作业要求您生成许多字符串并将它们散列并找到两个发生冲突的字符串。

由于您使用的是 40 位哈希函数，因此您预计必须尝试 2 ^{20 个}字符串（平均）才能找到第一次冲突。处理它的一种方法是生成和散列任何长度为零及以上的字符串。

一种方法是这样的：（我实际上没有尝试过任何代码。）

using std::string;

template <typename F>
bool GenAllStrings_Worker (string & s, unsigned idx, string const & char_set,
                           unsigned long long & cnt, F f)
{
    if (idx >= s.size())
        return f(s, ++cnt);

    for (auto c : char_set)
    {
        s[idx] = c;
        if (GenAllStrings_Worker(s, idx + 1, char_set, cnt, f))
            return true;
    }

    return false;
}

// Continues generating successively longer strings until "f" returns true.
// Passes each generated string and number of strings generated so far to "f".
template <typename F>
void GenAllStrings (string const & char_set, F f)
{
    unsigned long long cnt = 0;

    for (unsigned len = 0; ; ++len)
    {
        string s (len, '?');
        if (GenAllStrings_Worker (s, 0, char_set, cnt, f))
            return;
    }
}

你可以像这样使用它：（不要忘记你需要提供hash_code类型和MyHashFunction函数。）

std::unordered_map<hash_code, string> generated_hashes;

GenAllStrings ("abcdef...ABCD...0123...whatever",
    [&](string const & s, unsigned long long cnt){
        hash_code h = MyHashFunction(s);
        if (generated_hashes.find(h) != generated_hashes.end())
        {
            std::cout << "#" << cnt << " - Found a collision: '" << s <<"'"
                 << " collides with '" << generated_hashes[h] << "'"
                 << ", with a hash of " << h << std::endl;
            return true;
        }
        h[h] = s;
        return false;
    });

我已经编写了传递给GenAllStrings函数以在第一次碰撞后停止的 lambda；但是你可以产生任何你想要（或有时间）的碰撞，在你达到一定长度的字符串等之后停止。

score 1 · Accepted Answer

提示：如果您应用生日悖论，您需要多少个随机字符串才能使这些字符串中的 2 个具有相同的 40 位散列的概率很大？

为什么不编写一个生成这么多字符串的程序，然后尝试找到冲突呢？

c++ - C++ Two distinct strings with the same 40 bit hash value

2 回答 2

Related

Reference