动态并行示例:
__global__ void nestedHelloWorld(int const iSize,int iDepth) {
int tid = threadIdx.x;
printf("Recursion=%d: Hello World from thread %d" "block %d\n",iDepth,tid,blockIdx.x);
// condition to stop recursive execution
if (iSize == 1) return;
// reduce block size to half
int nthreads = iSize>>1;
// thread 0 launches child grid recursively
if(tid == 0 && nthreads > 0) {
nestedHelloWorld<<<1, nthreads>>>(nthreads,++iDepth);
printf("-------> nested execution depth: %d\n",iDepth);
}
}
用一个块打印,整个父网格用两个块完成:
./nestedHelloWorld Execution Configuration: grid 1 block 8
Recursion=0: Hello World from thread 0 block 0
Recursion=0: Hello World from thread 1 block 0
Recursion=0: Hello World from thread 2 block 0
Recursion=0: Hello World from thread 3 block 0
Recursion=0: Hello World from thread 4 block 0
Recursion=0: Hello World from thread 5 block 0
Recursion=0: Hello World from thread 6 block 0
Recursion=0: Hello World from thread 7 block 0
-------> nested execution depth: 1
Recursion=1: Hello World from thread 0 block 0
Recursion=1: Hello World from thread 1 block 0
Recursion=1: Hello World from thread 2 block 0
Recursion=1: Hello World from thread 3 block 0
-------> nested execution depth: 2
Recursion=2: Hello World from thread 0 block 0
Recursion=2: Hello World from thread 1 block 0
-------> nested execution depth: 3
Recursion=3: Hello World from thread 0 block 0
假设我从 threadIdx.x==0 的块中的一个线程启动子网格。我可以假设父网格中的所有其他线程在我启动子网格之前都已完成执行吗?
如果是这样,这是如何工作的?我正在阅读的是父网格在子网格之前没有在技术上完成。没有关于没有启动子线程的其他父线程的保证。