I'm trying to profile OpenCL application, a.out
, in a system with NVIDIA TITAN X and CUDA 8.0.
If it was CUDA application, nvprof ./a.out
would be enough. But I found this does not work with OpenCL application, with a message "No kernels were profiled."
Until CUDA 7.5, I successfully used COMPUTE_PROFILE=1
following this. Unfortunately, the documentation says "The support for command-line profiler using the environment variable COMPUTE_PROFILE has been dropped in the CUDA 8.0 release."
The question is, is there any way other than downgrading CUDA to profile OpenCL application with nvprof?