c++ - 使用时钟计算时间的值为零 - linux

Question

我有一个在 GPU 上执行计算的 cuda 代码。我正在使用时钟（）；找出时间

我的代码结构是

__global__ static void sum(){

// calculates sum 
}

extern "C"
int run_kernel(int array[],int nelements){
 clock_t start, end;
  start = clock();
  //perform operation on gpu - call sum
 end = clock();
 double elapsed_time = ((double) (end - start)) / CLOCKS_PER_SEC;
 printf("time required : %lf", elapsed_time);
}

但时间总是 0.0000 我检查了打印开始和结束时间。开始有一些价值，但结束时间总是零。

知道可能是什么原因吗？测量时间的任何替代方法。

任何帮助，将不胜感激。

谢谢

score 7 · Accepted Answer

这里有两个问题：

该clock()函数的分辨率太低，无法测量您尝试计时的事件的持续时间
CUDA 内核启动是一个异步操作，因此它几乎不消耗时间（在正常平台上通常为 10-20 微秒）。除非您使用同步 CUDA API 调用来强制主机 CPU 阻塞直到内核完成运行，否则您不会测量执行时间。

CUDA 有自己的高精度计时 API，它是在 GPU 上运行的计时操作的推荐方式。使用它的代码如下所示：

int run_kernel(int array[],int nelements){

    cudaEvent_t start,stop;
    cudaEventCreate(&start);
    cudaEventCreate(&stop);

    cudaEventRecord(start, 0);

    //
    //perform operation on gpu - call sum
    //

    cudaEventRecord(stop, 0); 
    cudaEventSynchronize(stop); 
    float elapsedTime; 
    cudaEventElapsedTime(&elapsedTime, start, stop); 
    printf("time required : %f", elapsed_time); 

    cudaEventDestroy(start);
    cudaEventDestroy(stop);
}

score 5 · Accepted Answer

不要clock用来计时 CUDA 内核启动。使用cudaEventElapsedTime. 即使clock足够高的精度来计时您的内核（不是），内核启动也是异步的，这意味着控制流在内核完成之前返回到您的调用函数。

就是这样：

void run_kernel(...)
{
  // create "events" which record the start & finish of the kernel of interest
  cudaEvent_t start, end;
  cudaEventCreate(&start);
  cudaEventCreate(&end):

  // record the start of the kernel
  cudaEventRecord(start);

  // perform operation on gpu - call sum
  sum<<<...>>>(...);

  // record the end of the kernel
  cudaEventRecord(end);

  // get elapsed time. Note that this call blocks
  // until the kernel is complete
  float ms;
  cudaEventElapsedTime(&ms, start, end);

  printf("time required : %f milliseconds", ms);

  cudaEventDestroy(start);
  cudaEventDestroy(end);
}

score 0 · Accepted Answer

我相信你现在应该使用clock_gettime()来CLOCK_MONOTONIC测量经过时间到高分辨率。在我的电脑上，分辨率是 1ns，这已经足够了。

你可以像这样使用它

#include <time.h>
...

struct timespec start, end, res;

clock_getres(CLOCK_MONOTONIC, &res);
/* exact format string depends on your system, on mine time_t is long */
printf("Resolution is %ld s, %ld ns\n" res.tv_sec, res.tv_nsec);

clock_gettime(CLOCK_MONOTONIC, &start);
/* whatever */
clock_gettime(CLOCK_MONOTONIC, &end);

编译-lrt

编辑：我看到我在这方面采取了错误的方法，如果你需要的话，显然你应该使用 CUDA 计时。我按照您对系统进行计时的问题的思路进行了操作。

score 0 · Accepted Answer

0

cuda 内核启动是异步的，因此您必须在内核之后添加 cudaThreadSynchronize()。

于 2012-04-30T12:56:32.370 回答

c++ - 使用时钟计算时间的值为零 - linux

4 回答 4

Related

Reference