cuda - 是否有可以在字符串中使用 NVRTC 编译的标头列表？

Question

（使用 NVRTC 运行时编译器）

有一串CUDA函数：

R"(
        extern "C" __global__ void test1(float * a, float * b, float *c)
        {
            int id= blockIdx.x * blockDim.x + threadIdx.x;
            c[id]=a[id]+b[id];
        }
)"

被驱动API成功编译成ptx代码并在程序中用于计算c=a+b。

但是当我尝试一些标题来包含一个算法时

R"(
        #include <climits>

        extern "C" __global__ void test1(float * a, float * b, float *c, int * gpuOffset)
        {
            int id=blockIdx.x * blockDim.x + threadIdx.x;
            device_vector<int> dv;
            c[id]=a[id]+b[id];
        }


)"

它返回一个错误说

test1.cu(23): catastrophic error: cannot open source file "climits"

1 catastrophic error detected in the compilation of "test1.cu".
Compilation terminated.

或者

test1.cu(28): error: identifier "device_vector" is undefined

取决于包含或标头的类（例如 device_vector）。

此外，文档显示 cuFFT 和推力都只能在主机端使用，似乎我不能使用我想在每个线程块上独立使用的任何“部分”算法。

是否有一些支持 cuda 的算法的标头列表用作每个块：

R"(
        #include "driver_api_fft.h"
        #include "driver_api_ifft.h"
        extern "C" __global__ void test1(float * a, float * b, float *c)
        {
            int id=blockIdx.x * blockDim.x + threadIdx.x;
            fft(a,id,1024);
            ifft(b,id,1024);
            c[id]=a[id]+b[id];
        }
)"

在任何目标机器上成功编译和运行，或者是否可以将这些算法库（对于 device_vector 的推力）从主机端链接到 ptx 链接器，以便我可以以某种方式从编译的内核中使用它们？如果这些都不可能，那么我是否需要自己编写傅立叶变换并通过自己实现算法使其“快速”？

cuda - 是否有可以在字符串中使用 NVRTC 编译的标头列表？

0 回答 0

Related

Reference