我目前没有Fermi,但目标平台是tesla/Fermi,我想问的是Fermi是否支持这样的Open MP:
#pragma omp parallel for num_threads(N)
for (int i=0; i<1000; ++i)
{
int threadID=omp_get_thread_num();
cudafunctions<<<blocks, threads, 1024, streams[threadID]>>>(input+i*colsizeofinput);
}//where there are N streams created.