cuda - CUDA 基准测试中的执行时间问题

Question

我正在尝试分析一些 CUDA Rodinia 基准测试，包括它们的 SM 和内存利用率、功耗等。为此，我同时执行基准测试和分析器，它本质上生成一个 pthread 以使用 NVML 库分析 GPU 执行。

问题是，如果我不同时调用探查器，基准测试的执行时间要比使用探查器执行基准测试的情况高得多（大约 3 倍）。CPU 的频率缩放调节器是用户空间，所以我认为 CPU 的频率不会改变。是因为GPU频率的闪烁吗？下面是分析器的代码。

#include <pthread.h>
#include <stdio.h>
#include "nvml.h"
#include "unistd.h"
#define NUM_THREADS     1

void *PrintHello(void *threadid)
{
   long tid;
   tid = (long)threadid;
  // printf("Hello World! It's me, thread #%ld!\n", tid);

nvmlReturn_t result;
nvmlDevice_t device;
nvmlUtilization_t utilization;
nvmlClockType_t jok;
unsigned int device_count, i,powergpu,clo;
char version[80];
result = nvmlInit();
result = nvmlSystemGetDriverVersion(version,80);
printf("\n Driver version: %s \n\n", version);
result = nvmlDeviceGetCount(&device_count);
printf("Found %d device%s\n\n", device_count,
device_count != 1 ? "s" : "");
printf("Listing devices:\n");
result = nvmlDeviceGetHandleByIndex(0, &device);

while(1)

{
result = nvmlDeviceGetPowerUsage(device,&powergpu );
result = nvmlDeviceGetUtilizationRates(device, &utilization);
printf("\n%d\n",powergpu);




        if (result == NVML_SUCCESS)
        {
           printf("%d\n",  utilization.gpu);
           printf("%d\n",  utilization.memory);
        }
result=nvmlDeviceGetClockInfo(device,NVML_CLOCK_SM,&clo);
if(result==NVML_SUCCESS)
{
printf("%d\n",clo);
}
usleep(500000);
}


pthread_exit(NULL);
}

int main (int argc, char *argv[])
{
   pthread_t threads[NUM_THREADS];

int rc;
   long t;
   for(t=0; t<NUM_THREADS; t++){
      printf("In main: creating thread %ld\n", t);
      rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t);
      if (rc){
         printf("ERROR; return code from pthread_create() is %d\n", rc);
         exit(-1);
      }
   }

   /* Last thing that main() should do */
   pthread_exit(NULL);

}

score 1 · Accepted Answer

随着您的分析器运行，GPU 将退出其睡眠状态（由于对 API 的访问，该nvmlAPI 正在从 GPU 查询数据）。这使它们对 CUDA 应用程序的响应速度更快，因此如果您对整个应用程序执行计时（例如使用 linuxtime命令），应用程序似乎运行得“更快”。

一种解决方案是使用命令将 GPU 置于“持久模式” nvidia-smi（用于nvidia-smi --help获取命令行帮助）。

另一种解决方案是在应用程序中进行计时，并从计时测量中排除 CUDA 启动时间，可能是通过执行 cuda 命令，例如cudaFree(0);在计时开始之前。

cuda - CUDA 基准测试中的执行时间问题

1 回答 1

Related

Reference