c++ - 如何定义一个可以被 CUDA 内核和常规 C++ 函数调用的利用函数

Question

我正在做一个涉及大量数学的项目。对于单个目标问题（例如梯度计算），我们总是有两个版本的实现：一个 CPU 版本和一个 CUDA 版本。

现在 CPU 版本是用常规 C++ 编写的，内核版本是用 CUDA 编写的。如果我想定义一个小函数，例如返回向量权重的vec_weight，我必须为g++编译的CPU编写一个用于CPU版本和一个在编译之前具有“__ device__ ”的cuda版本通过 nvcc。

我不想在这里定义一个“__ device__ __ host__ ”函数。我想要的是一种可以由常规 C++ 函数和 CUDA 内核调用的库。我尝试使用“ __CUDACC__ ”宏，但没有成功。

因为我们会有很多CPU版本和GPU版本都需要的小利用功能，我觉得合二为一比较合理。

将 CPU 版本写入 .cu 而不是 .cpp 可能会解决我们的问题，但这不是我们想要的。

所以我该怎么做？

这是代码段：

头.h：

  1 #ifndef HEAD_H
  2 #define HEAD_H
  3 #ifdef __cplusplus
  4 extern "C"{
  5 #endif
  6 __device__ __host__ void myprint();
  7 #ifdef __cplusplus
  8 }
  9 #endif
  10 #endif

头.cu：

  1 #include "head.h"
  2 #include <stdio.h>
  3 void myprint(){
  4 // do something here
  5 }

主文件

 1 #include "head.h"
 2 int main(){
 3 myprint();
 4 }

我通过以下方式编译了 head.cu：

nvcc -c head.cu

通过以下方式将它们链接在一起：

g++ main.cpp head.o -o main   ( The reason that I didn't use nvcc here is that we are using the PGI's pgcpp in our project and we need it to talk to the PGI's OMP library. But I'm sure that there is something wrong here but I don't know how to fix that. )

错误消息：

In file included from main.cpp:18:
head.h:6: error: ‘__device__’ does not name a type
main.cpp: In function ‘int main()’:
main.cpp:20: error: ‘myprint’ was not declared in this scope

所以我很确定 g++ 在这里无法识别“__ device__ ”前缀。但是我们的项目要求我们使用 PGCPP 来编译 cpp 文件，因为这是使 omp 指令在 Fortran 和 C 中都能正常工作的唯一方法（我们的项目混合了 C/C++、Fortran 和 CUDA）。但是在这里，即使 g++ 也无法工作，所以我认为我们已经先解决了这个问题。

score 3 · Accepted Answer

库通常包含为目标处理器（CPU 或 GPU）编译的代码 - 因此您需要通过 NVCC 编译它。因此，您不妨将其放入 .cu 文件中。

如果您可以发布源代码，那么您可以将代码放在标头中并将它们包含在 .cpp 或 .cu 文件中。

更新

这就是我在代码中所做的（hdf可以从主机和设备调用函数）：

文件devhost.h

#ifndef FUNCTIONS_H_
#define FUNCTIONS_H_

int myhost_functin(int);

#endif

文件cu.cu：

__host__ __device__
int hdf(int a) {
    return a + 4;
}

int myhost_function(int a) {
    return hdf(a);
}

__global__
void kern(int *data) {
    data[threadIdx.x] = hdf(data[threadIdx.x]);
}

文件cpp.cpp：

#include <stdio.h>
#include <stdlib.h>

#include "devhost.h"

int main() {
    printf ("%d\n", myhost_function(5));
    return 0;
}

这就是我编译和链接它的方式：

nvcc -c cu.cu
gcc -c cpp.cpp
gcc cpp.o cu.o -lcudart -L/usr/local/cuda-5.5/lib64

请注意，您需要与 CUDART 链接，因为 CU 文件有设备调用。

更新 2

似乎仍然可以编译的稍微不那么优雅的方法是在您的头文件中包含以下内容：

#ifdef __CUDACC__
__host__ __device__
#endif
static int anotherfunction(int a) {
    return a * 50;
}

在这种情况下，您将在每个翻译单元中都有一份代码副本，这将增加您的编译时间并可能增加可执行文件的大小。

c++ - 如何定义一个可以被 CUDA 内核和常规 C++ 函数调用的利用函数

1 回答 1

Related

Reference