我正在运行一个共轭梯度算法来求解一个大小为 96 x 96 的线性系统。使用相同的代码、相同的迭代次数和相同的精度(双精度),Geforce 480 的耗时约为 33.6 毫秒,而在 Tesla C2070 上大约是 132.1 毫秒,几乎是 Geforce 480 的 4 倍!
你觉得这很正常吗?有没有人遇到类似的结果,或者我做错了什么?
非常感谢!
我正在运行一个共轭梯度算法来求解一个大小为 96 x 96 的线性系统。使用相同的代码、相同的迭代次数和相同的精度(双精度),Geforce 480 的耗时约为 33.6 毫秒,而在 Tesla C2070 上大约是 132.1 毫秒,几乎是 Geforce 480 的 4 倍!
你觉得这很正常吗?有没有人遇到类似的结果,或者我做错了什么?
非常感谢!
Stumbling on this post when looking for conjugate gradient.
For this matrix size (96x96), the conjugate gradient is just overkill: you may use Cholesky decomposition, which should be much faster. Similarly, using a GPU doesn't seem useful, except if you solve a bunch of them in parallel.
For the performance difference, there may be various explanations, but I would suggest that the iterative part of the CG algorithm is probably limiting - due to the system's size, once again: the Geforce may be better at latency and to communicate with the CPU.