0

我有一个用于 CUDA 的内存包装器,它执行简单的引用计数(ala shared_ptr)。我用 nvcc 编译 C++ 类,请参阅要点

然后我想简单地在我的基本 c++ 主文件中使用它:

#include "CudaMemory.h"
typedef CudaDoubleMemory GPUMemory;

int main(int argc, char** argv) {

    GPUMemory d_mem(3 * 3);

    return 0;
}

但是当我用 nvcc 编译它时,我得到了很多错误:

nvcc --shared --compiler-options -fPIC -shared src/CudaMemory.cu -o libmem.so
src/CudaMemory.cu(29): error: return value type does not match the function type
src/CudaMemory.cu(46): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(46): error: explicit type is missing ("int" assumed)
src/CudaMemory.cu(46): error: expected a "{"
src/CudaMemory.cu(47): warning: missing return statement at end of non-void function "CudaMemory"
src/CudaMemory.cu(49): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(49): error: explicit type is missing ("int" assumed)
src/CudaMemory.cu(49): error: expected a "{"
src/CudaMemory.cu(51): error: identifier "d_ptr" is undefined
src/CudaMemory.cu(51): error: identifier "scalar_type" is undefined
src/CudaMemory.cu(53): error: identifier "count" is undefined
src/CudaMemory.cu(54): error: identifier "ref_id" is undefined
src/CudaMemory.cu(56): warning: missing return statement at end of non-void function "CudaMemory"
src/CudaMemory.cu(58): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(58): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(58): error: explicit type is missing ("int" assumed)
src/CudaMemory.cu(58): error: expected a "{"
src/CudaMemory.cu(60): error: identifier "count" is undefined
src/CudaMemory.cu(62): error: identifier "ref_id" is undefined
src/CudaMemory.cu(64): warning: missing return statement at end of non-void function "CudaMemory"
src/CudaMemory.cu(66): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(66): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(66): error: identifier "this_type" is undefined
src/CudaMemory.cu(68): error: identifier "count" is undefined
src/CudaMemory.cu(69): error: identifier "ref_id" is undefined
src/CudaMemory.cu(69): error: identifier "d_ptr" is undefined
src/CudaMemory.cu(74): error: identifier "d_ptr" is undefined
src/CudaMemory.cu(75): error: identifier "ref_id" is undefined
src/CudaMemory.cu(82): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(84): error: identifier "count" is undefined
src/CudaMemory.cu(85): error: identifier "ref_id" is undefined
src/CudaMemory.cu(85): error: identifier "d_ptr" is undefined
src/CudaMemory.cu(89): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(89): error: incomplete type is not allowed
src/CudaMemory.cu(89): error: identifier "scalar_type" is undefined
src/CudaMemory.cu(89): error: identifier "host_ptr" is undefined
src/CudaMemory.cu(89): error: expected a ";"
At end of source: warning: parsing restarts here after previous syntax error
34 errors detected in the compilation of "/tmp/tmpxft_000018e6_00000000-4_CudaMemory.cpp1.ii".

我在这里做错了什么?我读到与extern "C"有一些关系,但它是 C++ 代码,而不是 C 代码......

编辑:我所做的甚至有意义吗?我的印象是在我的情况下不可能有模板参数,因为 cuda 将无法完成它的工作,因为它不知道将使用什么类型。

然后我应该如何操作来做预期的事情?是将 cudaMalloc、cudaFree 和 cudaMemcpy 封装在外部函数中的唯一解决方案,这些函数我将在 .cu 中实现,其余的都在 .h 中,这样就不需要在 .cu 中进行模板化(但是我'当然会在.h中实现类)?

解决方案?:所以我选择了标题中包含所有内容的版本,并且不需要使用 nvcc。它编译甚至运行,但由于“重复”免费而崩溃,尽管没有调用重复的免费(调试输出仅显示一个)。见新要点。由于都在一个头文件中,因此发生了很多变化。

现在当我运行新的主程序时:

#include "CudaMemory.h"
typedef gpu::CudaDoubleMemory GPUMemory;
#include <iostream>
int main(int argc, char** argv) {

    // testing the self adjoint eigenvalue kernel
    // selfAdjointEigensTest();
    GPUMemory d_mem(3 * 3);

    std::cout << "Memory size: " << d_mem.size() << std::endl;
    std::cout << "Memory reference: " << d_mem.get() << std::endl;
    std::cout << "Memory reference count: " << d_mem.ref_count() << std::endl;

    return 0;
}

我得到了我要求的结果,但是在程序退出时,它崩溃了(所以这里似乎存在内存问题)。至少解决了代码分离的主要问题。哦,我必须添加-lcudart以便 cuda_runtime.h 的东西可用。

Memory size: 9
Memory reference: 0x700100000
Memory reference count: 1
Freeing ref#0
*** glibc detected *** /home/alexandre/NetBeansProjects/GPU_TEST/dist/Debug/GNU-Linux-x86/gpu_test: double free or corruption (fasttop): 0x000000000104b3f0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x76d76)[0x7f9bbfd99d76]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f9bbfd9eaac]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x23c32)[0x7f9bc086fc32]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x2012b)[0x7f9bc086c12b]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x26d6b)[0x7f9bc0872d6b]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x26f7b)[0x7f9bc0872f7b]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x19e0c)[0x7f9bc0865e0c]
/lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xa5)[0x7f9bbfd5a175]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x5b66)[0x7f9bc0851b66]
======= Memory map: ========
00400000-00408000 r-xp 00000000 08:07 24633669                           /home/alexandre/NetBeansProjects/GPU_TEST/dist/Debug/GNU-Linux-x86/gpu_test
00607000-00608000 rw-p 00007000 08:07 24633669                           /home/alexandre/NetBeansProjects/GPU_TEST/dist/Debug/GNU-Linux-x86/gpu_test
00f60000-0106b000 rw-p 00000000 00:00 0                                  [heap]
200000000-900000000 ---p 00000000 00:00 0 
7f9bb8000000-7f9bb8021000 rw-p 00000000 00:00 0 
7f9bb8021000-7f9bbc000000 ---p 00000000 00:00 0 
7f9bbd2f4000-7f9bbd2f5000 rw-p 00000000 00:00 0 
7f9bbd2f5000-7f9bbd3f5000 rw-s 369cd5000 00:05 5720                      /dev/nvidia0
7f9bbd3f5000-7f9bbd4f5000 rw-s 368e8c000 00:05 5720                      /dev/nvidia0
7f9bbd4f5000-7f9bbd5f5000 rw-s 368ac3000 00:05 5720                      /dev/nvidia0
7f9bbd5f5000-7f9bbd6f5000 rw-s 00000000 00:04 79644                      /dev/zero (deleted)
7f9bbd6f5000-7f9bbd7f5000 rw-s 38238d000 00:05 5720                      /dev/nvidia0
7f9bbd7f5000-7f9bbd8f5000 rw-s 00000000 00:04 79643                      /dev/zero (deleted)
7f9bbd8f5000-7f9bbd8f6000 rw-s efee6000 00:05 5720                       /dev/nvidia0
7f9bbd8f6000-7f9bbd8f7000 rw-s 382385000 00:05 5720                      /dev/nvidia0
7f9bbd8f7000-7f9bbdcf9000 rw-s 3e39b9000 00:05 5720                      /dev/nvidia0
7f9bbdcf9000-7f9bbe0fb000 rw-s 38eade000 00:05 5720                      /dev/nvidia0
7f9bbe0fb000-7f9bbe0fc000 ---p 00000000 00:00 0 
7f9bbe0fc000-7f9bbe8fc000 rwxp 00000000 00:00 0 
7f9bbe8fc000-7f9bbe912000 r-xp 00000000 08:07 3145792                    /lib/x86_64-linux-gnu/libz.so.1.2.7
7f9bbe912000-7f9bbeb11000 ---p 00016000 08:07 3145792                    /lib/x86_64-linux-gnu/libz.so.1.2.7
7f9bbeb11000-7f9bbeb12000 r--p 00015000 08:07 3145792                    /lib/x86_64-linux-gnu/libz.so.1.2.7
7f9bbeb12000-7f9bbeb13000 rw-p 00016000 08:07 3145792                    /lib/x86_64-linux-gnu/libz.so.1.2.7
7f9bbeb13000-7f9bbf3c0000 r-xp 00000000 08:07 13985100                   /usr/lib/x86_64-linux-gnu/libcuda.so.304.64
7f9bbf3c0000-7f9bbf5c0000 ---p 008ad000 08:07 13985100                   /usr/lib/x86_64-linux-gnu/libcuda.so.304.64
7f9bbf5c0000-7f9bbf6d2000 rw-p 008ad000 08:07 13985100                   /usr/lib/x86_64-linux-gnu/libcuda.so.304.64
7f9bbf6d2000-7f9bbf6fb000 rw-p 00000000 00:00 0 
7f9bbf6fb000-7f9bbf702000 r-xp 00000000 08:07 3145947                    /lib/x86_64-linux-gnu/librt-2.13.so
7f9bbf702000-7f9bbf901000 ---p 00007000 08:07 3145947                    /lib/x86_64-linux-gnu/librt-2.13.so
7f9bbf901000-7f9bbf902000 r--p 00006000 08:07 3145947                    /lib/x86_64-linux-gnu/librt-2.13.so
7f9bbf902000-7f9bbf903000 rw-p 00007000 08:07 3145947                    /lib/x86_64-linux-gnu/librt-2.13.so
7f9bbf903000-7f9bbf91a000 r-xp 00000000 08:07 3145939                    /lib/x86_64-linux-gnu/libpthread-2.13.so
7f9bbf91a000-7f9bbfb19000 ---p 00017000 08:07 3145939                    /lib/x86_64-linux-gnu/libpthread-2.13.so
7f9bbfb19000-7f9bbfb1a000 r--p 00016000 08:07 3145939                    /lib/x86_64-linux-gnu/libpthread-2.13.so
7f9bbfb1a000-7f9bbfb1b000 rw-p 00017000 08:07 3145939                    /lib/x86_64-linux-gnu/libpthread-2.13.so
7f9bbfb1b000-7f9bbfb1f000 rw-p 00000000 00:00 0 
7f9bbfb1f000-7f9bbfb21000 r-xp 00000000 08:07 3145944                    /lib/x86_64-linux-gnu/libdl-2.13.so
7f9bbfb21000-7f9bbfd21000 ---p 00002000 08:07 3145944                    /lib/x86_64-linux-gnu/libdl-2.13.so
7f9bbfd21000-7f9bbfd22000 r--p 00002000 08:07 3145944                    /lib/x86_64-linux-gnu/libdl-2.13.so
7f9bbfd22000-7f9bbfd23000 rw-p 00003000 08:07 3145944                    /lib/x86_64-linux-gnu/libdl-2.13.so
7f9bbfd23000-7f9bbfea3000 r-xp 00000000 08:07 3145953                    /lib/x86_64-linux-gnu/libc-2.13.so
7f9bbfea3000-7f9bc00a3000 ---p 00180000 08:07 3145953                    /lib/x86_64-linux-gnu/libc-2.13.so
7f9bc00a3000-7f9bc00a7000 r--p 00180000 08:07 3145953                    /lib/x86_64-linux-gnu/libc-2.13.so
7f9bc00a7000-7f9bc00a8000 rw-p 00184000 08:07 3145953                    /lib/x86_64-linux-gnu/libc-2.13.so
7f9bc00a8000-7f9bc00ad000 rw-p 00000000 00:00 0 
7f9bc00ad000-7f9bc00c2000 r-xp 00000000 08:07 3145790                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f9bc00c2000-7f9bc02c2000 ---p 00015000 08:07 3145790                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f9bc02c2000-7f9bc02c3000 rw-p 00015000 08:07 3145790                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f9bc02c3000-7f9bc0344000 r-xp 00000000 08:07 3145949                    /lib/x86_64-linux-gnu/libm-2.13.so
7f9bc0344000-7f9bc0543000 ---p 00081000 08:07 3145949                    /lib/x86_64-linux-gnu/libm-2.13.so
7f9bc0543000-7f9bc0544000 r--p 00080000 08:07 3145949                    /lib/x86_64-linux-gnu/libm-2.13.so
7f9bc0544000-7f9bc0545000 rw-p 00081000 08:07 3145949                    /lib/x86_64-linux-gnu/libm-2.13.so
7f9bc0545000-7f9bc062d000 r-xp 00000000 08:07 13986699                   /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7f9bc062d000-7f9bc082d000 ---p 000e8000 08:07 13986699                   /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7f9bc082d000-7f9bc0835000 r--p 000e8000 08:07 13986699                   /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7f9bc0835000-7f9bc0837000 rw-p 000f0000 08:07 13986699                   /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7f9bc0837000-7f9bc084c000 rw-p 00000000 00:00 0 
7f9bc084c000-7f9bc08a7000 r-xp 00000000 08:07 13985818                   /usr/lib/x86_64-linux-gnu/libcudart.so.4.2.9
7f9bc08a7000-7f9bc0aa7000 ---p 0005b000 08:07 13985818                   /usr/lib/x86_64-linux-gnu/libcudart.so.4.2.9
7f9bc0aa7000-7f9bc0aa8000 r--p 0005b000 08:07 13985818                   /usr/lib/x86_64-linux-gnu/libcudart.so.4.2.9
7f9bc0aa8000-7f9bc0aa9000 rw-p 0005c000 08:07 13985818                   /usr/lib/x86_64-linux-gnu/libcudart.so.4.2.9
7f9bc0aa9000-7f9bc0aaa000 rw-p 00000000 00:00 0 
7f9bc0aaa000-7f9bc0ac3000 r-xp 00000000 08:07 24627226                   /home/alexandre/NetBeansProjects/GPU_LIB/libgpu.so
7f9bc0ac3000-7f9bc0cc3000 ---p 00019000 08:07 24627226                   /home/alexandre/NetBeansProjects/GPU_LIB/libgpu.so
7f9bc0cc3000-7f9bc0cc4000 rw-p 00019000 08:07 24627226                   /home/alexandre/NetBeansProjects/GPU_LIB/libgpu.so
7f9bc0cc4000-7f9bc0ce4000 r-xp 00000000 08:07 3145957                    /lib/x86_64-linux-gnu/ld-2.13.so
7f9bc0d9c000-7f9bc0dbd000 rw-p 00000000 00:00 0 
7f9bc0dbd000-7f9bc0ebd000 rw-s 00000000 00:04 79639                      /dev/zero (deleted)
7f9bc0ebd000-7f9bc0ec4000 rw-p 00000000 00:00 0 
7f9bc0ede000-7f9bc0edf000 rw-s efee5000 00:05 5720                       /dev/nvidia0
7f9bc0edf000-7f9bc0ee0000 rw-s 38eba1000 00:05 5720                      /dev/nvidia0
7f9bc0ee0000-7f9bc0ee1000 r--s f2009000 00:05 5720                       /dev/nvidia0
7f9bc0ee1000-7f9bc0ee3000 rw-p 00000000 00:00 0 
7f9bc0ee3000-7f9bc0ee4000 r--p 0001f000 08:07 3145957                    /lib/x86_64-linux-gnu/ld-2.13.so
7f9bc0ee4000-7f9bc0ee5000 rw-p 00020000 08:07 3145957                    /lib/x86_64-linux-gnu/ld-2.13.so
7f9bc0ee5000-7f9bc0ee6000 rw-p 00000000 00:00 0 
7fff2b5ec000-7fff2b60c000 rwxp 00000000 00:00 0                          [stack]
7fff2b60c000-7fff2b60d000 rw-p 00000000 00:00 0 
7fff2b78c000-7fff2b78d000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

RUN FINISHED; Aborted; real time: 50ms; user: 0ms; system: 0ms

最后一个可行的解决方案:以前的要点中有两个错误。新的要点修复了它们。主要是:

  1. gpu::internal 中的数据必须是静态的(一个不是,我不知道为什么,我猜是错字......)。
  2. 在 newReference 中,当使用旧的已释放条目时,引用计数应初始化为 1(与没有可用的已释放条目时的工作方式相同)

至此,现在已经完全解决了。我还添加了 countReferences 以检查最后是否没有泄漏(到目前为止,对于我的测试,没有泄漏)。

结论:当没有设备代码时,我们通常可以在没有 nvcc 的情况下编译,我们只需要包含cuda_runtime.h来调用 cudaXXX 函数。感谢@罗伯特·克罗维拉。

4

2 回答 2

1

我的工作解决方案:

  1. 删除所有非 nvcc 错误,确保它在没有 cuda 特定代码的情况下工作
  2. 将所有非 nvcc 代码提取到 .h 标头中
  3. 删除特定于 nvcc 的代码,改用cuda_runtime.h(并与-lcudart链接)

nvcc 可以使用模板,我并不是说不是这样,但我认为不可能有一个类定义是模板,并且它的实现只能完成一次,因为模板是要被实例化的,并且编译器为每个新的实例创建代码,因此使用 nvcc 模板代码编译然后使用 g++ 实例化它是没有意义的。

它有效

于 2013-03-11T23:00:54.643 回答
0

我解决了 2 个第一个编译错误:
1. freeIndexes.pop_back() 返回 void 而不是 int...
2. 你应该#include <iostream>

于 2013-03-11T14:49:40.057 回答