I have a small CUDA program that I want to profile with nvprof
. The problem is that I want to write the program in such a way that
- When I run
nvprof my_prog
, it will invokecudaProfilerStart
andcudaProfilerStop
. - When I run
my_prog
, it will not invoke any of the above APIs, and therefore can get rid of profiling overhead.
The problem hence becomes how to make my code aware of the presence of nvprof
when it runs, without additional command line argument.