0

I need to access a variable on CPU and CUDA GPU. Currently, I am transferring that variable to CPU after kernel finishes, but it is turning out to be bottleneck in my application. Is there any way faster way to access a variable on CPU after GPU finishes execution? Can pinned memory help me here?

4

1 回答 1

1

您在问是否应该使用固定内存,因此我假设您没有使用它,这也意味着您没有执行异步 memcpy,因为这需要固定内存。

所以回答你的问题:是的,你应该使用固定内存并使用流和异步内存传输函数来尽快获得结果。

另请参阅http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#asynchronous-concurrent-executionhttp://docs.nvidia.com/cuda/cuda-c-programming -guide/index.html#page-locked-host-memory

于 2013-02-11T18:39:53.187 回答