2

I have in mind to to use getrf and getrs from the cuSolver package and to solve AB=X with B=I.

  • Is this the most best way to solve this problem?

  • If so, what is the best way to create the col-major identity matrix B in device memory? It can be done trivially using a for loop but this would 1. take up a lot of memory and 2. be quite slow. Is there a faster way?

Note that cuSolver does not provide getri unfortunately. Therefore I must to use getrs.

4

1 回答 1

1

在 CUDA 提供 LAPACK API 之前getri,我认为getrf并且getrs是大型矩阵求逆的最佳选择。

矩阵B的大小与 相同A,因此我认为分配不会B使此任务消耗比其输入/输出数据更大的内存。

getrf和的复杂度分别getrsO(n^3)O(n^2),而设置B=IO(n^2) + O(n)。我认为这不应该成为整个过程的瓶颈。你可以分享你的实现,这样我们就可以检查问题可能出在哪里。

于 2018-08-25T16:42:21.410 回答