0

I am solving AX=B using Cusparse for 1440 times (A is being modified, only diagonal elements, and B is also different). I am creating handle and analysis object just once.

I am using cusparseScrsilu0().

I want to perform this operation using streams. I have tried this using one handle and creating multiple streams but didn't get any speed-up.

Please, help me out of this problem.

4

1 回答 1

1

您期望多流设计可以使您的 CUDA 内核同时执行。然而,多流并不总是导致并发内核执行。内核只有在满足某些预请求时才能被并发执行。最重要的条件之一是每个内核只占用一小部分硬件资源(SM、纹理、本地内存等)。因此,如果您的问题的规模足够大,那么将没有额外的资源供另一个内核同时运行。

于 2013-07-21T08:17:29.710 回答