I am trying to generate "random" numbers from a uniform distribution inside a CUDA __global__
kernel using two different approaches. The first is using the cuRAND
device API, and the second is using thrust
. For each approach I have created a different class.
Here is my cuRAND
solution:
template<typename T>
struct RNG1
{
__device__
RNG1(unsigned int tid) {
curand_init(tid, tid, 0, &state);
}
__device__ T
operator ()(void) {
return curand_uniform(&state);
}
curandState state;
};
And here is my thrust
solution:
template<typename T>
struct RNG2
{
__device__
RNG2(unsigned int tid)
: gen(tid)
, dis(0, 1) { gen.discard(tid); }
__device__ T
operator ()(void) {
return dis(gen);
}
thrust::default_random_engine gen;
thrust::uniform_real_distribution<T> dis;
};
The way I use them is the following:
template<typename T> __global__ void
mykernel(/* args here */)
{
unsigned int tid = blockIdx.x * blockDim.x + threadIdx.x;
RNG1<T> rng(tid);
// or
RNG2<T> rng(tid);
T a_random_number = rng();
// do stuff here
}
Both of them work but the cuRAND
solution is much slower (more than 3 times slower). If I set the second parameter of curand_init
(sequence number) to 0, then the performance is the same as that of the thrust
solution, but the random numbers are "bad". I can see patterns and artefacts in the resulting distribution.
Here are my two questions:
- Can someone explain to me why the
cuRAND
solution with a non-zero sequence is slower? - How can
thrust
be as fast ascuRAND
with zero sequence, but also generate good random numbers? - While searching on Google, I noticed that most people use
cuRAND
, and very few usethrust
to generate random numbers inside device code. Is there something I should be aware of? Am I misusingthrust
?
Thank you.