我正在尝试测量我的代码性能(这是 OpenCL 内核的执行),我真的需要了解加速。我尝试使用clock() 和clock_gettime() 函数。
在第一种情况下,我的代码简单明了,并且测量正确:
struct timespec start_r, start_m, stop_r, stop_m;
double realtime, monotonic;
time_t start2 = clock();
if(clock_gettime(CLOCK_REALTIME, &start_r) == -1) {
cout << "clock realtime error!" << endl;
}
if(clock_gettime(CLOCK_MONOTONIC, &start_m) == -1) {
cout << "clock realtime error!" << endl;
}
double res = 0.0;
for(unsigned long i = 0; i < total; i++) {
res += data[i];
}
cout << "res = " << res << endl;
time_t end2 = clock();
if(clock_gettime(CLOCK_REALTIME, &stop_r) == -1) {
cout << "clock realtime error!" << endl;
}
if(clock_gettime(CLOCK_MONOTONIC, &stop_m) == -1) {
cout << "clock realtime error!" << endl;
}
cout << "Time clock() = " << (end2 - start2)/(double)CLOCKS_PER_SEC << endl;
realtime = (stop_r.tv_sec - start_r.tv_sec) + (double)(stop_r.tv_nsec - start_r.tv_nsec) / (double)BILLION;
monotonic = (stop_m.tv_sec - start_m.tv_sec) + (double)(stop_m.tv_nsec - start_m.tv_nsec) / (double)BILLION;
cout << "Realtime = " << realtime << endl << "Monotonic = " << monotonic << endl;
它给出了可以理解的结果——所有三个结果几乎相同。
在测量 OpenCL 内核的执行时间时,我的做法完全相同,但我得到的结果很糟糕:
Time = 0.04
Realtime = 0.26113
Monotonic = 0.26113
你能告诉我它有什么问题吗?如果这是衡量 OpenCL 内核性能的常见问题,您能否提出衡量它的最佳方法?谢谢!