algorithm - SHA256 查找部分冲突

Question

我有两条信息：

messageA: "Frank is one of the "best" students topicId{} "

messageB: "Frank is one of the "top" students topicId{} "

我需要找到这两条消息（8 位）的 SHA256 部分冲突。因此，SHA256(messageA) 的前 8 个摘要 == SHA256(messageB) 的前 8 个摘要

我们可以在中放任何字母和数字{}，两个 {} 应该有相同的字符串

我已经尝试过使用哈希表的蛮力和生日攻击来解决这个问题，但它花费了太多时间。我知道像Floyd 和 Brent这样的循环检测算法，但是我不知道如何为这个问题构建循环。有没有其他方法可以解决这个问题？太感谢了！

score 2 · Accepted Answer

用生日攻击来解决这个问题非常简单。这是我在 Python (v2) 中的做法：

def find_collision(ntries):
    from hashlib import sha256
    str1 = 'Frank is one of the "best" students topicId{%d} '
    str2 = 'Frank is one of the "top" students topicId{%d} '
    seen = {}
    for n in xrange(ntries):
        h = sha256(str1 % n).digest()[:4].encode('hex')
        seen[h] = n
    for n in xrange(ntries):
        h = sha256(str2 % n).digest()[:4].encode('hex')
        if h in seen:
            print str1 % seen[h]
            print str2 % n

find_collision(100000)

如果您的尝试花费了太长时间才找到解决方案，那么您要么只是在某处编码错误，要么您使用了错误的数据类型。

Python 的字典数据类型是使用哈希表实现的。这意味着您可以在恒定时间内搜索字典元素。如果您seen在上面的代码中使用列表而不是字典来实现，那么第 11 行的搜索将花费更长的时间。

编辑：

如果这两个topicId标记必须相同，那么——正如评论中所指出的——别无选择，只能通过大约 2 ³¹个值的某个位置。您最终会发现碰撞，但可能需要很长时间。

让它在一夜之间运行，如果运气好的话，你会在早上得到答案：

def find_collision():
    from hashlib import sha256
    str1 = 'Frank is one of the "best" students topicId{%x} '
    str2 = 'Frank is one of the "top" students topicId{%x} '
    seen = {}
    n = 0
    while True:
        if sha256(str1 % n).digest()[:4] == sha256(str2 % n).digest()[:4]:
            print str1 % n
            print str2 % n
            break
        n += 1

find_collision()

如果您赶时间，您可以考虑使用 GPU 来加速哈希计算。

score 2 · Accepted Answer

我假设问题中字符串末尾的空格是故意的，所以我把它留在了里面。

“弗兰克是“顶尖”学生之一 topicId{59220691223} ” 6026d9b3 23898bcd7ecdbcbcd575b0a1d9dc22fd9e60074aefcbaade494a50ae

“弗兰克是“最好的”学生之一 topicId{59220691223} ” 6026d9b3 1ba780bb9973e7cfc8c9f74a35b54448d441a61cc9bf8db0fcae5280

实际上，使用蛮力找到了大约 70 亿次尝试，比我预期的要多得多。

我认为 2^32 大约是 43 亿，因此在 43 亿次尝试后找不到任何匹配的机会约为 36.78%

实际上，我在大约 70 亿次尝试后找到了匹配，在 70 亿次尝试中没有匹配的可能性不到 20%。

这是我在 7 个线程上运行的 C++ 代码，每个线程都有不同的起点，一旦在任何线程上找到匹配项，它就会退出。每个线程还会更新其进度，以每 100 万次尝试计算一次。

我已经快速转发到在 threadId=5 上找到匹配项的位置，因此运行时间不到一分钟。但是，如果您更改起点，则可以寻找其他匹配项。

而且我也不确定如何使用Floyd 和 Brent，因为字符串必须使用相同的 topicId，所以你被锁定在前缀和后缀上。

/*
To compile go get picosha2 header file from https://github.com/okdshin/PicoSHA2 
Copy this code into same directory as picosha2.h file, save it as hash.cpp for example.
On Linux go to command line and cd to directory where these files are. 

To compile it:
g++ -O2 -o hash hash.cpp -l pthread

And run it:
./hash

*/

#include <iostream>
#include <string>
#include <thread>
#include <mutex>

// I used picoSHA2 header only file for the hashing
// https://github.com/okdshin/PicoSHA2
#include "picosha2.h"


// return 1st 4 bytes (8 chars) of SHA256 hash
std::string hash8(const std::string& src_str) {
    std::vector<unsigned char> hash(picosha2::k_digest_size);
    picosha2::hash256(src_str.begin(), src_str.end(), hash.begin(), hash.end());
    return picosha2::bytes_to_hex_string(hash.begin(), hash.begin() + 4);
}

bool done = false;
std::mutex mtxCout;

void work(unsigned long long threadId) {
    std::string a = "Frank is one of the \"best\" students topicId{",
        b = "Frank is one of the \"top\" students topicId{";
        
    // Each thread gets a different starting point, I've fast forwarded to the part 
    // where I found the match so this won't take long to run if you try it, < 1 minute.
    // If you want to run a while drop the last "+ 150000000ULL" term and it will run 
    // for about 1 billion total (150 million each thread, assuming 7 threads) take 
    // about 30 minutes on Linux.
    // Collision occurred on threadId = 5, so if you change it to use less than 6 threads  
    // then your mileage may vary.
    
    unsigned long long start = threadId * (11666666667ULL + 147000000ULL) + 150000000ULL;
    unsigned long long x = start;
    
    for (;;) {
        // Not concerned with making the reading/updating "done" flag atomic, unlikely
        // 2 collisions are found at once on separate threads, and writing to cout 
        // is guarded anyway.
        
        if (done) return;
        std::string xs = std::to_string(x++);
        std::string hashA = hash8(a + xs + "} "), hashB = hash8(b + xs + "} ");
        
        if (hashA == hashB) {
            std::lock_guard<std::mutex> lock(mtxCout);
            std::cout << "*** SOLVED ***" << std::endl;
            std::cout << (x-1) << std::endl;
            std::cout << "\"" << a << (x - 1) << "} \" = " << hashA << std::endl;
            std::cout << "\"" << b << (x - 1) << "} \"  = " << hashB << std::endl;
            done = true;
            return;
        }
        
        if (((x - start) % 1000000ULL) == 0) {
            std::lock_guard<std::mutex> lock(mtxCout);
            std::cout << "thread: " << threadId << " = " << (x-start) 
                << " tries so far" << std::endl;
        }
    }
}

void runBruteForce() {
    const int NUM_THREADS = 7;
    std::thread threads[NUM_THREADS];
    for (int i = 0; i < NUM_THREADS; i++) threads[i] = std::thread(work, i);
    for (int i = 0; i < NUM_THREADS; i++) threads[i].join();
}

int main(int argc, char** argv) {
    runBruteForce();
    return 0;
}

algorithm - SHA256 查找部分冲突

2 回答 2

编辑：

Related

Reference