gpu - Geforce 480 和 Tesla C2070 上共轭梯度算法的显着性能差异

Question

我正在运行一个共轭梯度算法来求解一个大小为 96 x 96 的线性系统。使用相同的代码、相同的迭代次数和相同的精度（双精度），Geforce 480 的耗时约为 33.6 毫秒，而在 Tesla C2070 上大约是 132.1 毫秒，几乎是 Geforce 480 的 4 倍！

你觉得这很正常吗？有没有人遇到类似的结果，或者我做错了什么？

非常感谢！

score 0 · Accepted Answer

Stumbling on this post when looking for conjugate gradient.

For this matrix size (96x96), the conjugate gradient is just overkill: you may use Cholesky decomposition, which should be much faster. Similarly, using a GPU doesn't seem useful, except if you solve a bunch of them in parallel.

For the performance difference, there may be various explanations, but I would suggest that the iterative part of the CG algorithm is probably limiting - due to the system's size, once again: the Geforce may be better at latency and to communicate with the CPU.

gpu - Geforce 480 和 Tesla C2070 上共轭梯度算法的显着性能差异

1 回答 1

Related

Reference