2

我想知道是否有办法优化 matlab 函数gather。我正在开发一个具有非常大矩阵的 Cuda 代码,我注意到gather我必须用来取回我的 GPU 数据的函数非常慢。对于 2^13x2^8 矩阵,大约需要 3 秒!

4

1 回答 1

3

The performance of gpuArray.gather is limited by your PCI bus. You can optimise this only by gathering less data (e.g. using indexing). Note that in more recent versions of Parallel Computing Toolbox, many of the operations are asynchronous, but gather is not - so perhaps you are simply seeing the time for the asynchronous requests to complete. You can check using wait(gpuDevice) to synchronize the device.

于 2013-03-27T14:21:35.550 回答