c++ - clock_gettime() CUDA 的计时问题

Question

我想编写一个 CUDA 代码，我可以直接看到 CUDA 为加速应用程序提供的好处。

这是我使用 Thrust ( http://code.google.com/p/thrust/ )编写的 CUDA 代码

简而言之，代码所做的就是创建两个 2^23 长度的整数向量，一个在主机上，一个在设备上，彼此相同，然后对它们进行排序。它还（尝试）测量每个的时间。

在我使用的宿主向量上std::sort。在我使用的设备向量上thrust::sort。

对于我使用的编译

nvcc sortcompare.cu -lrt

程序在终端的输出是

桌面：./a.out

主机时间为：19。224622882 秒

设备时间为：19。321644143 秒

桌面：

第一个 std::cout 语句在 19.224 秒后生成，如所述。然而，第二个 std::cout 语句（即使它说 19.32 秒）是在第一个 std::cout 语句之后立即产生的。请注意，我在 clock_gettime() 即 ts_host 和 ts_device 中使用了不同的时间戳进行测量

我正在使用 Cuda 4.0 和 NVIDIA GTX 570 计算能力 2.0

  #include<iostream>
    #include<vector>
    #include<algorithm>
    #include<stdlib.h>

    //For timings
    #include<time.h>
    //Necessary thrust headers
    #include<thrust/sort.h>
    #include<thrust/host_vector.h>
    #include<thrust/device_vector.h>
    #include<thrust/copy.h>


    int main(int argc, char *argv[])
    {
      int N=23;
      thrust::host_vector<int>H(1<<N);//create a vector of 2^N elements on host
      thrust::device_vector<int>D(1<<N);//The same on the device.
      thrust::host_vector<int>dummy(1<<N);//Copy the D to dummy from GPU after sorting 

       //Set the host_vector elements. 
      for (int i = 0; i < H.size(); ++i)    {
          H[i]=rand();//Set the host vector element to pseudo-random number.
        }

      //Sort the host_vector. Measure time
      // Reset the clock
        timespec ts_host;
        ts_host.tv_sec = 0;
        ts_host.tv_nsec = 0;
        clock_settime(CLOCK_PROCESS_CPUTIME_ID, &ts_host);//Start clock

             thrust::sort(H.begin(),H.end());

        clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_host);//Stop clock
        std::cout << "\nHost Time taken is: " << ts_host.tv_sec<<" . "<< ts_host.tv_nsec <<" seconds" << std::endl;


        D=H; //Set the device vector elements equal to the host_vector
      //Sort the device vector. Measure time.
        timespec ts_device;
        ts_device.tv_sec = 0;
            ts_device.tv_nsec = 0;
        clock_settime(CLOCK_PROCESS_CPUTIME_ID, &ts_device);//Start clock

             thrust::sort(D.begin(),D.end());
             thrust::copy(D.begin(),D.end(),dummy.begin());


        clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_device);//Stop clock
        std::cout << "\nDevice Time taken is: " << ts_device.tv_sec<<" . "<< ts_device.tv_nsec <<" seconds" << std::endl;

      return 0;
    }

score 1 · Accepted Answer

您没有检查clock_settime. 我猜它失败了，可能errno设置为 EPERM 或 EINVAL。阅读文档并始终检查您的返回值！

如果我是对的，您并没有像您认为的那样重置时钟，因此第二个时间与第一个时间是累积的，加上一些您根本不打算计算的额外内容。

正确的方法是clock_gettime只调用，首先存储结果，进行计算，然后从结束时间中减去原始时间。

c++ - clock_gettime() CUDA 的计时问题

1 回答 1

Related

Reference