0

当我在 cuda-gdb 中启动我的程序时,我得到如下输出:

[New Thread 0x7fffef8ea700 (LWP 8003)]
[New Thread 0x7fffe35b2700 (LWP 8010)]
[New Thread 0x7fffe2db1700 (LWP 8011)]
[New Thread 0x7fffe25b0700 (LWP 8012)]

我不明白为什么一开始就启动了这些多个线程。我还没有以多线程模式启动我的程序。我正在使用 MPI,但我启动了一个进程。那么,这些线程是从哪里来的呢?

这不会以任何方式影响我的调试过程。只是我不明白这是什么意思。

4

2 回答 2

3

您看到的这些线程是由 CUDA 运行时库创建的,与cuda-gdb自身没有直接关系。如果您使用 运行相同的代码gdb,您也会看到相同的消息。

如果您想查看这些线程正在做什么或它们来自何处,只需使用-g标志编译您的代码,在您的代码中设置一个断点(例如,在 CUDA 内核启动之前),运行它,然后在gdb控制台中运行以下命令:

thread apply all backtrace

该命令与 gdb 的效果相同backtrace,只是它将显示程序创建的所有线程的回溯。

就我而言,我在启动程序后收到以下消息:

[New Thread 0x7fffeffb3700 (LWP 7141)]
[New Thread 0x7fffef731700 (LWP 7142)]
[New Thread 0x7fffeef30700 (LWP 7143)]

当我在控制台中运行上述命令时gdb,我看到以下输出:

(gdb) thread apply all backtrace

Thread 4 (Thread 0x7fffeef30700 (LWP 7143)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007ffff63c19b7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff6386bb7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread (arg=0x7fffeef30700) at pthread_create.c:309
#5  0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x7fffef731700 (LWP 7142)):
#0  0x00007ffff6cc5aed in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff63bf6a3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff642261e in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread (arg=0x7fffef731700) at pthread_create.c:309
#5  0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7fffeffb3700 (LWP 7141)):
#0  0x00007ffff6ccfa9f in accept4 (fd=13, addr=..., addr_len=0x7fffeffb2e18, flags=-1) at ../sysdeps/unix/sysv/linux/accept4.c:45
#1  0x00007ffff63c0556 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff63b404d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread (arg=0x7fffeffb3700) at pthread_create.c:309
#5  0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7ffff7fc0740 (LWP 7136)):
#0  main () at cuda_heap.cu:66

正如您可以验证的,所有在开始时创建的线程都匹配线程地址和 LWP(轻量级进程)ID。可以看到,它们都来自 libcuda.so.1。

cuda-gdb中,您可以看到一些更详细的信息:

(cuda-gdb) thread apply all bt

Thread 4 (Thread 0x7fffeef30700 (LWP 10019)):
#0  0x00007ffff79c33f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007ffff63c19b7 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff6386bb7 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 3 (Thread 0x7fffef731700 (LWP 10018)):
#0  0x00007ffff6cc5aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff63bf6a3 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff642261e in cuVDPAUCtxCreate () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 2 (Thread 0x7fffeffb3700 (LWP 10017)):
#0  0x00007ffff6ccfa9f in accept4 () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff63c0556 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff63b404d in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7ffff7fc0740 (LWP 10007)):
#0  main () at cuda_heap.cu:66
于 2017-08-22T10:04:35.787 回答
-2

我不知道它到底是什么,但我认为 cuda-gdb 需要创建多个线程来捕获错误/异常,例如:内存冲突或银行冲突。

于 2017-08-22T07:19:08.787 回答