我有一个用于 CUDA 的内存包装器,它执行简单的引用计数(ala shared_ptr)。我用 nvcc 编译 C++ 类,请参阅要点。
然后我想简单地在我的基本 c++ 主文件中使用它:
#include "CudaMemory.h"
typedef CudaDoubleMemory GPUMemory;
int main(int argc, char** argv) {
GPUMemory d_mem(3 * 3);
return 0;
}
但是当我用 nvcc 编译它时,我得到了很多错误:
nvcc --shared --compiler-options -fPIC -shared src/CudaMemory.cu -o libmem.so
src/CudaMemory.cu(29): error: return value type does not match the function type
src/CudaMemory.cu(46): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(46): error: explicit type is missing ("int" assumed)
src/CudaMemory.cu(46): error: expected a "{"
src/CudaMemory.cu(47): warning: missing return statement at end of non-void function "CudaMemory"
src/CudaMemory.cu(49): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(49): error: explicit type is missing ("int" assumed)
src/CudaMemory.cu(49): error: expected a "{"
src/CudaMemory.cu(51): error: identifier "d_ptr" is undefined
src/CudaMemory.cu(51): error: identifier "scalar_type" is undefined
src/CudaMemory.cu(53): error: identifier "count" is undefined
src/CudaMemory.cu(54): error: identifier "ref_id" is undefined
src/CudaMemory.cu(56): warning: missing return statement at end of non-void function "CudaMemory"
src/CudaMemory.cu(58): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(58): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(58): error: explicit type is missing ("int" assumed)
src/CudaMemory.cu(58): error: expected a "{"
src/CudaMemory.cu(60): error: identifier "count" is undefined
src/CudaMemory.cu(62): error: identifier "ref_id" is undefined
src/CudaMemory.cu(64): warning: missing return statement at end of non-void function "CudaMemory"
src/CudaMemory.cu(66): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(66): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(66): error: identifier "this_type" is undefined
src/CudaMemory.cu(68): error: identifier "count" is undefined
src/CudaMemory.cu(69): error: identifier "ref_id" is undefined
src/CudaMemory.cu(69): error: identifier "d_ptr" is undefined
src/CudaMemory.cu(74): error: identifier "d_ptr" is undefined
src/CudaMemory.cu(75): error: identifier "ref_id" is undefined
src/CudaMemory.cu(82): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(84): error: identifier "count" is undefined
src/CudaMemory.cu(85): error: identifier "ref_id" is undefined
src/CudaMemory.cu(85): error: identifier "d_ptr" is undefined
src/CudaMemory.cu(89): error: argument list for class template "CudaMemory" is missing
src/CudaMemory.cu(89): error: incomplete type is not allowed
src/CudaMemory.cu(89): error: identifier "scalar_type" is undefined
src/CudaMemory.cu(89): error: identifier "host_ptr" is undefined
src/CudaMemory.cu(89): error: expected a ";"
At end of source: warning: parsing restarts here after previous syntax error
34 errors detected in the compilation of "/tmp/tmpxft_000018e6_00000000-4_CudaMemory.cpp1.ii".
我在这里做错了什么?我读到与extern "C"有一些关系,但它是 C++ 代码,而不是 C 代码......
编辑:我所做的甚至有意义吗?我的印象是在我的情况下不可能有模板参数,因为 cuda 将无法完成它的工作,因为它不知道将使用什么类型。
然后我应该如何操作来做预期的事情?是将 cudaMalloc、cudaFree 和 cudaMemcpy 封装在外部函数中的唯一解决方案,这些函数我将在 .cu 中实现,其余的都在 .h 中,这样就不需要在 .cu 中进行模板化(但是我'当然会在.h中实现类)?
解决方案?:所以我选择了标题中包含所有内容的版本,并且不需要使用 nvcc。它编译甚至运行,但由于“重复”免费而崩溃,尽管没有调用重复的免费(调试输出仅显示一个)。见新要点。由于都在一个头文件中,因此发生了很多变化。
现在当我运行新的主程序时:
#include "CudaMemory.h"
typedef gpu::CudaDoubleMemory GPUMemory;
#include <iostream>
int main(int argc, char** argv) {
// testing the self adjoint eigenvalue kernel
// selfAdjointEigensTest();
GPUMemory d_mem(3 * 3);
std::cout << "Memory size: " << d_mem.size() << std::endl;
std::cout << "Memory reference: " << d_mem.get() << std::endl;
std::cout << "Memory reference count: " << d_mem.ref_count() << std::endl;
return 0;
}
我得到了我要求的结果,但是在程序退出时,它崩溃了(所以这里似乎存在内存问题)。至少解决了代码分离的主要问题。哦,我必须添加-lcudart以便 cuda_runtime.h 的东西可用。
Memory size: 9
Memory reference: 0x700100000
Memory reference count: 1
Freeing ref#0
*** glibc detected *** /home/alexandre/NetBeansProjects/GPU_TEST/dist/Debug/GNU-Linux-x86/gpu_test: double free or corruption (fasttop): 0x000000000104b3f0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x76d76)[0x7f9bbfd99d76]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f9bbfd9eaac]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x23c32)[0x7f9bc086fc32]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x2012b)[0x7f9bc086c12b]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x26d6b)[0x7f9bc0872d6b]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x26f7b)[0x7f9bc0872f7b]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x19e0c)[0x7f9bc0865e0c]
/lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xa5)[0x7f9bbfd5a175]
/usr/lib/x86_64-linux-gnu/libcudart.so.4(+0x5b66)[0x7f9bc0851b66]
======= Memory map: ========
00400000-00408000 r-xp 00000000 08:07 24633669 /home/alexandre/NetBeansProjects/GPU_TEST/dist/Debug/GNU-Linux-x86/gpu_test
00607000-00608000 rw-p 00007000 08:07 24633669 /home/alexandre/NetBeansProjects/GPU_TEST/dist/Debug/GNU-Linux-x86/gpu_test
00f60000-0106b000 rw-p 00000000 00:00 0 [heap]
200000000-900000000 ---p 00000000 00:00 0
7f9bb8000000-7f9bb8021000 rw-p 00000000 00:00 0
7f9bb8021000-7f9bbc000000 ---p 00000000 00:00 0
7f9bbd2f4000-7f9bbd2f5000 rw-p 00000000 00:00 0
7f9bbd2f5000-7f9bbd3f5000 rw-s 369cd5000 00:05 5720 /dev/nvidia0
7f9bbd3f5000-7f9bbd4f5000 rw-s 368e8c000 00:05 5720 /dev/nvidia0
7f9bbd4f5000-7f9bbd5f5000 rw-s 368ac3000 00:05 5720 /dev/nvidia0
7f9bbd5f5000-7f9bbd6f5000 rw-s 00000000 00:04 79644 /dev/zero (deleted)
7f9bbd6f5000-7f9bbd7f5000 rw-s 38238d000 00:05 5720 /dev/nvidia0
7f9bbd7f5000-7f9bbd8f5000 rw-s 00000000 00:04 79643 /dev/zero (deleted)
7f9bbd8f5000-7f9bbd8f6000 rw-s efee6000 00:05 5720 /dev/nvidia0
7f9bbd8f6000-7f9bbd8f7000 rw-s 382385000 00:05 5720 /dev/nvidia0
7f9bbd8f7000-7f9bbdcf9000 rw-s 3e39b9000 00:05 5720 /dev/nvidia0
7f9bbdcf9000-7f9bbe0fb000 rw-s 38eade000 00:05 5720 /dev/nvidia0
7f9bbe0fb000-7f9bbe0fc000 ---p 00000000 00:00 0
7f9bbe0fc000-7f9bbe8fc000 rwxp 00000000 00:00 0
7f9bbe8fc000-7f9bbe912000 r-xp 00000000 08:07 3145792 /lib/x86_64-linux-gnu/libz.so.1.2.7
7f9bbe912000-7f9bbeb11000 ---p 00016000 08:07 3145792 /lib/x86_64-linux-gnu/libz.so.1.2.7
7f9bbeb11000-7f9bbeb12000 r--p 00015000 08:07 3145792 /lib/x86_64-linux-gnu/libz.so.1.2.7
7f9bbeb12000-7f9bbeb13000 rw-p 00016000 08:07 3145792 /lib/x86_64-linux-gnu/libz.so.1.2.7
7f9bbeb13000-7f9bbf3c0000 r-xp 00000000 08:07 13985100 /usr/lib/x86_64-linux-gnu/libcuda.so.304.64
7f9bbf3c0000-7f9bbf5c0000 ---p 008ad000 08:07 13985100 /usr/lib/x86_64-linux-gnu/libcuda.so.304.64
7f9bbf5c0000-7f9bbf6d2000 rw-p 008ad000 08:07 13985100 /usr/lib/x86_64-linux-gnu/libcuda.so.304.64
7f9bbf6d2000-7f9bbf6fb000 rw-p 00000000 00:00 0
7f9bbf6fb000-7f9bbf702000 r-xp 00000000 08:07 3145947 /lib/x86_64-linux-gnu/librt-2.13.so
7f9bbf702000-7f9bbf901000 ---p 00007000 08:07 3145947 /lib/x86_64-linux-gnu/librt-2.13.so
7f9bbf901000-7f9bbf902000 r--p 00006000 08:07 3145947 /lib/x86_64-linux-gnu/librt-2.13.so
7f9bbf902000-7f9bbf903000 rw-p 00007000 08:07 3145947 /lib/x86_64-linux-gnu/librt-2.13.so
7f9bbf903000-7f9bbf91a000 r-xp 00000000 08:07 3145939 /lib/x86_64-linux-gnu/libpthread-2.13.so
7f9bbf91a000-7f9bbfb19000 ---p 00017000 08:07 3145939 /lib/x86_64-linux-gnu/libpthread-2.13.so
7f9bbfb19000-7f9bbfb1a000 r--p 00016000 08:07 3145939 /lib/x86_64-linux-gnu/libpthread-2.13.so
7f9bbfb1a000-7f9bbfb1b000 rw-p 00017000 08:07 3145939 /lib/x86_64-linux-gnu/libpthread-2.13.so
7f9bbfb1b000-7f9bbfb1f000 rw-p 00000000 00:00 0
7f9bbfb1f000-7f9bbfb21000 r-xp 00000000 08:07 3145944 /lib/x86_64-linux-gnu/libdl-2.13.so
7f9bbfb21000-7f9bbfd21000 ---p 00002000 08:07 3145944 /lib/x86_64-linux-gnu/libdl-2.13.so
7f9bbfd21000-7f9bbfd22000 r--p 00002000 08:07 3145944 /lib/x86_64-linux-gnu/libdl-2.13.so
7f9bbfd22000-7f9bbfd23000 rw-p 00003000 08:07 3145944 /lib/x86_64-linux-gnu/libdl-2.13.so
7f9bbfd23000-7f9bbfea3000 r-xp 00000000 08:07 3145953 /lib/x86_64-linux-gnu/libc-2.13.so
7f9bbfea3000-7f9bc00a3000 ---p 00180000 08:07 3145953 /lib/x86_64-linux-gnu/libc-2.13.so
7f9bc00a3000-7f9bc00a7000 r--p 00180000 08:07 3145953 /lib/x86_64-linux-gnu/libc-2.13.so
7f9bc00a7000-7f9bc00a8000 rw-p 00184000 08:07 3145953 /lib/x86_64-linux-gnu/libc-2.13.so
7f9bc00a8000-7f9bc00ad000 rw-p 00000000 00:00 0
7f9bc00ad000-7f9bc00c2000 r-xp 00000000 08:07 3145790 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f9bc00c2000-7f9bc02c2000 ---p 00015000 08:07 3145790 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f9bc02c2000-7f9bc02c3000 rw-p 00015000 08:07 3145790 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f9bc02c3000-7f9bc0344000 r-xp 00000000 08:07 3145949 /lib/x86_64-linux-gnu/libm-2.13.so
7f9bc0344000-7f9bc0543000 ---p 00081000 08:07 3145949 /lib/x86_64-linux-gnu/libm-2.13.so
7f9bc0543000-7f9bc0544000 r--p 00080000 08:07 3145949 /lib/x86_64-linux-gnu/libm-2.13.so
7f9bc0544000-7f9bc0545000 rw-p 00081000 08:07 3145949 /lib/x86_64-linux-gnu/libm-2.13.so
7f9bc0545000-7f9bc062d000 r-xp 00000000 08:07 13986699 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7f9bc062d000-7f9bc082d000 ---p 000e8000 08:07 13986699 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7f9bc082d000-7f9bc0835000 r--p 000e8000 08:07 13986699 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7f9bc0835000-7f9bc0837000 rw-p 000f0000 08:07 13986699 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7f9bc0837000-7f9bc084c000 rw-p 00000000 00:00 0
7f9bc084c000-7f9bc08a7000 r-xp 00000000 08:07 13985818 /usr/lib/x86_64-linux-gnu/libcudart.so.4.2.9
7f9bc08a7000-7f9bc0aa7000 ---p 0005b000 08:07 13985818 /usr/lib/x86_64-linux-gnu/libcudart.so.4.2.9
7f9bc0aa7000-7f9bc0aa8000 r--p 0005b000 08:07 13985818 /usr/lib/x86_64-linux-gnu/libcudart.so.4.2.9
7f9bc0aa8000-7f9bc0aa9000 rw-p 0005c000 08:07 13985818 /usr/lib/x86_64-linux-gnu/libcudart.so.4.2.9
7f9bc0aa9000-7f9bc0aaa000 rw-p 00000000 00:00 0
7f9bc0aaa000-7f9bc0ac3000 r-xp 00000000 08:07 24627226 /home/alexandre/NetBeansProjects/GPU_LIB/libgpu.so
7f9bc0ac3000-7f9bc0cc3000 ---p 00019000 08:07 24627226 /home/alexandre/NetBeansProjects/GPU_LIB/libgpu.so
7f9bc0cc3000-7f9bc0cc4000 rw-p 00019000 08:07 24627226 /home/alexandre/NetBeansProjects/GPU_LIB/libgpu.so
7f9bc0cc4000-7f9bc0ce4000 r-xp 00000000 08:07 3145957 /lib/x86_64-linux-gnu/ld-2.13.so
7f9bc0d9c000-7f9bc0dbd000 rw-p 00000000 00:00 0
7f9bc0dbd000-7f9bc0ebd000 rw-s 00000000 00:04 79639 /dev/zero (deleted)
7f9bc0ebd000-7f9bc0ec4000 rw-p 00000000 00:00 0
7f9bc0ede000-7f9bc0edf000 rw-s efee5000 00:05 5720 /dev/nvidia0
7f9bc0edf000-7f9bc0ee0000 rw-s 38eba1000 00:05 5720 /dev/nvidia0
7f9bc0ee0000-7f9bc0ee1000 r--s f2009000 00:05 5720 /dev/nvidia0
7f9bc0ee1000-7f9bc0ee3000 rw-p 00000000 00:00 0
7f9bc0ee3000-7f9bc0ee4000 r--p 0001f000 08:07 3145957 /lib/x86_64-linux-gnu/ld-2.13.so
7f9bc0ee4000-7f9bc0ee5000 rw-p 00020000 08:07 3145957 /lib/x86_64-linux-gnu/ld-2.13.so
7f9bc0ee5000-7f9bc0ee6000 rw-p 00000000 00:00 0
7fff2b5ec000-7fff2b60c000 rwxp 00000000 00:00 0 [stack]
7fff2b60c000-7fff2b60d000 rw-p 00000000 00:00 0
7fff2b78c000-7fff2b78d000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
RUN FINISHED; Aborted; real time: 50ms; user: 0ms; system: 0ms
最后一个可行的解决方案:以前的要点中有两个错误。新的要点修复了它们。主要是:
- gpu::internal 中的数据必须是静态的(一个不是,我不知道为什么,我猜是错字......)。
- 在 newReference 中,当使用旧的已释放条目时,引用计数应初始化为 1(与没有可用的已释放条目时的工作方式相同)
至此,现在已经完全解决了。我还添加了 countReferences 以检查最后是否没有泄漏(到目前为止,对于我的测试,没有泄漏)。
结论:当没有设备代码时,我们通常可以在没有 nvcc 的情况下编译,我们只需要包含cuda_runtime.h来调用 cudaXXX 函数。感谢@罗伯特·克罗维拉。