cuda - 在CUDA中将类型转换为unsigned long long？

Question

基本上我想要的是一个像 hiloint2uint64() 这样的函数，只需加入两个 32 位整数并将结果重新解释为 uint64。

我在 CUDA 中找不到任何可以做到这一点的函数，无论如何，是否有任何 ptx 代码可以进行这种类型转换？

score 3 · Accepted Answer

您可以像这样定义自己的函数：

__host__ __device__ unsigned long long int hiloint2uint64(int h, int l)
{
    int combined[] = { h, l };

    return *reinterpret_cast<unsigned long long int*>(combined);
}

score 0 · Accepted Answer

现在可能有点晚了，但可能最安全的方法是“手动”使用位移和或：

uint32_t ui_h = h;
uint32_t ui_l = l;
return (uint64_t(h)<<32)|(uint64_t(l));

请注意，其他答案中提出的其他解决方案并不安全，因为整数数组可能不是 8 字节对齐的（无论如何，移位一些位比内存读/写更快）

score 0 · Accepted Answer

使用 uint2（但将临时变量定义为 64 位值：unsigned long long int）而不是数组来确保对齐。注意 l 和 h 的顺序。

__host__ __device__ __forceinline__ unsigned long long int hiloint2uint64(unsigned int h, unsigned int l)
{
    unsigned long long int result;
    uint2& src = *reinterpret_cast<uint2*>(&result);
    src.x = l;
    src.y = h;
    return result;
}

无论如何，CUDA 寄存器的大小都是 32 位。在最好的情况下，编译器不需要任何额外的代码。在最坏的情况下，它必须通过移动 32 位值来重新排序寄存器。

Godbolt 示例https://godbolt.org/z/3r9WYK9e7如何优化它。

cuda - 在CUDA中将类型转换为unsigned long long？

3 回答 3

Related

Reference