c++ - 如何访问数百万位进行散列

Question

我正在对可执行文件进行 MD5 哈希处理。我使用 python 脚本将可执行文件中的二进制文件读入文本文件，但如果我要将这个构造的文件读入 C 程序，我将处理 MB 的数据，因为 1 和 0 被视为chars，每个 1 位数字占用 8 位。是否可以将这些作为单个位读取？如果我制作一个 10MB 的数组来保存二进制转换长度和哈希填充可能需要的所有字符，那么程序的性能会有多糟糕？如果这是不可想象的，是否有更好的方法来处理数据？

score 1 · Accepted Answer

既然你标记了问题 C 和 C++，我会选择 C。

是否可以将这些作为单个位读取？

是的，只需从文件中一次读取 8 个字节，然后将这些1s 和0s 连接起来形成一个新字节。你不需要为此创建一个 10MB 的数组。

首先，从文件中读取 8 个字节。读取的char值将转换为整数值 (0和1)，然后移位以生成一个新字节。

unsigned char bits[8];
while (fread(bits, 1, 8, file) == 8) {
    for (unsigned int i = 0; i < 8; i++) {
        bits[i] -= '0';
    }

    char byte = (bits[0] << 7) | (bits[1] << 6) |
                (bits[2] << 5) | (bits[3] << 4) |
                (bits[4] << 3) | (bits[5] << 2) |
                (bits[6] << 1) | (bits[7]     );

    /* update MD5 Hash here */
}

然后，您将使用新读取的字节更新您的 MD5 哈希。

编辑：由于典型的 MD5 实现必须在处理之前将输入分成 512 位的块，因此您可以在实现本身中摆脱这种开销（但不推荐），只需从文件中读取 512 位（64 字节）然后直接更新哈希。

unsigned char buffer[64];
unsigned char bits[8];
unsigned int index = 0;

while (fread(bits, 1, 8, file) == 8) {
    for (unsigned int i = 0; i < 8; i++) {
        bits[i] -= '0';
    }

    buffer[index++] = (bits[0] << 7) | (bits[1] << 6) |
                      (bits[2] << 5) | (bits[3] << 4) |
                      (bits[4] << 3) | (bits[5] << 2) |
                      (bits[6] << 1) | (bits[7]     );

    if (index == 64) {
        index = 0;
        /* update MD5 hash with 64 byte buffer */
    }
}

/* This sends the remaining data to the MD5 hash function */
/* It's not likely that your file has exactly 512N chars */
if (index != 0) {
    while (index != 64) {
        buffer[index++] = 0;
    }
    /* update MD5 hash with the padded buffer. */
}

c++ - 如何访问数百万位进行散列

1 回答 1

Related

Reference