Threads don’t stall on memory access
From the famous paper http://www.cs.berkeley.edu/~volkov/volkov10-GTC.pdf by Vasily Volkov
I am assuming based on this statement that this:
__device__ int a;
int b, c, d;
a = b * c;
// Do some work that is independent of 'a'
// ...
d = a + 1;
Is faster than this
__device__ int a;
int b, c, d;
a = b * c;
d = a + 1;
// Do some work that is independent of 'a'
// ...
I am only assuming that because I am giving the chance to the thread to execute different instructions while writing to the global memory, while in the second approach I am not.
Is my assumption right?
And if my assumption is right, then is it a good practice to set all variables that are going to be used later, in the beginning of the kernel? Given that they are independent from each other, also assuming that a
is not cached.