c - C中使用64位而不是8位的按位异或

Question

我考虑如何对 2 字节数组进行有效的异或运算。我定义了这个字节数组，因为我认为对它们进行异或运算会快得多。这是真的吗？最好在 XORing 循环内如何有效地转换为这个？如果字节数组% 8 的长度不是 0，如何填充最后一个字节？unsigned char *uint64_tunsigned char *uint64_t *

这是我当前对字节数组进行异或的代码，但每个字节（无符号字符）分别为：

unsigned char *bitwise_xor(const unsigned char *A_Bytes_Array, const unsigned char *B_Bytes_Array, const size_t length) {

    unsigned char *XOR_Bytes_Array;

    // allocate XORed bytes array
    XOR_Bytes_Array = malloc(sizeof(unsigned char) * length);

    // perform bitwise XOR operation on bytes arrays A and B
    for(int i=0; i < length; i++)
        XOR_Bytes_Array[i] = (unsigned char)(A_Bytes_Array[i] ^ B_Bytes_Array[i]);

    return XOR_Bytes_Array;
}

好的，与此同时，我试图这样做。我的 bytes_array 相当大（rgba 位图 4*1440*900？）。

static uint64_t next64bitsFromBytesArray(const unsigned char *bytesArray, const int i) {

    uint64_t next64bits = (uint64_t) bytesArray[i+7] | ((uint64_t) bytesArray[i+6] << 8) | ((uint64_t) bytesArray[i+5] << 16) | ((uint64_t) bytesArray[i+4] << 24) | ((uint64_t) bytesArray[i+3] << 32) | ((uint64_t) bytesArray[i+2] << 40) | ((uint64_t) bytesArray[i+1] << 48) | ((uint64_t)bytesArray[i] << 56);
    return next64bits;
}

unsigned char *bitwise_xor64(const unsigned char *A_Bytes_Array, const unsigned char *B_Bytes_Array, const size_t length) {

    unsigned char *XOR_Bytes_Array;

    // allocate XORed bytes array
    XOR_Bytes_Array = malloc(sizeof(unsigned char) * length);

    // perform bitwise XOR operation on bytes arrays A and B using uint64_t
    for(int i=0; i<length; i+=8) {

        uint64_t A_Bytes = next64bitsFromBytesArray(A_Bytes_Array, i);
        uint64_t B_Bytes = next64bitsFromBytesArray(B_Bytes_Array, i);
        uint64_t XOR_Bytes = A_Bytes ^ B_Bytes;
        memcpy(XOR_Bytes_Array + i, &XOR_Bytes, 8);
    }

    return XOR_Bytes_Array; 
}

更新：（解决这个问题的第二种方法）

unsigned char *bitwise_xor64(const unsigned char *A_Bytes_Array, const unsigned char *B_Bytes_Array, const size_t length) {

    const uint64_t *aBytes = (const uint64_t *) A_Bytes_Array;
    const uint64_t *bBytes = (const uint64_t *) B_Bytes_Array;

    unsigned char *xorBytes = malloc(sizeof(unsigned char)*length);

    for(int i = 0, j=0; i < length; i +=8) {
            uint64_t aXORbBytes = aBytes[j] ^ bBytes[j];
            //printf("a XOR b = 0x%" PRIx64 "\n", aXORbBytes);
            memcpy(xorBytes + i, &aXORbBytes, 8);
            j++;
    }

    return xorBytes;
}

score 0 · Accepted Answer

于是我做了一个实验：

#include <stdlib.h>
#include <stdint.h>

#ifndef TYPE
#define TYPE uint64_t
#endif

TYPE *
xor(const void *va, const void *vb, size_t l)
{
        const TYPE *a = va;
        const TYPE *b = vb;
        TYPE *r = malloc(l);
        size_t i;

        for (i = 0; i < l / sizeof(TYPE); i++) {
                *r++ = *a++ ^ *b++;
        }
        return r;
}

uint64_t为和uint8_t带有基本优化的 clang编译。在这两种情况下，编译器都将其向量化了。不同之处在于 uint8_t 版本在不是 8 的倍数时有要处理l的代码。因此，如果我们添加代码来处理不是 8 的倍数的大小，您最终可能会得到等效的生成代码。此外，64 位版本多次展开循环并有代码来处理它，因此对于足够大的数组，您可能会在这里获得几个百分点。另一方面，在足够大的数组上，您将受到内存限制，并且 xor 操作无关紧要。

你确定你的编译器不会处理这个吗？这是一种微优化，只有在您测量事物时才有意义，然后您就不需要问哪个更快，您会知道的。

c - C中使用64位而不是8位的按位异或

1 回答 1

Related

Reference