cuda - 如何理解“一个块永远不会被多个 MP 分割。”？

Question

对于 CUDA，我理解“一个块永远不会被多个 MP 分割”。（http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/）。

为了测试这一点，我为内核分配了一个非常大的块大小：

__global__ void dummy()
{
}

int main()
{
        int N=21504*40000; //21504 is the total threads I found for my Tesla M2070
        dim3 grids(1,2);
        dim3 thres(N,N);
        dummy<<<grids,thres>>>();
        return 0;
}

但是，没有编译或运行时错误，并且不确定发生了什么......

score 2 · Accepted Answer

如果在 dummy<<<>>> 调用之后添加 cudaGetLastError()，您将收到 CUDA Launch Failure 错误（您可以使用 cudaGetErrorString(err_code) 将错误代码转换为字符串）。

score 0 · Accepted Answer

这些错误不是编译错误，运行后您将面临运行时错误。要理解这句话，您应该了解架构。它旨在通过一个 MP (SM) 的共享内存来加快放置在同一线程块中的线程的通信。所以，他们都驻留在同一个SM中，并没有被派遣。

cuda - 如何理解“一个块永远不会被多个 MP 分割。”？

2 回答 2

Related

Reference