4

I have an simulation application that I have written both in C and CUDA. To measure the speedup I have recorded the time in both cases. In CUDA, I have used CUDA events to measure the time and then dividing the time of GPU by CPU (as usually done). The image of the speedup is provided below.

The weird thing about the speedup graph is that the speedup first increases to 55X and then it decreases to 35X and then again increases as the total number of thread increases. I am not sure why this is happening and how I would be able to figure out the reason behind such an output. I am using a GTX 560ti GPU card with 448 cores. The number of threads for each block is 1024 (maximum number) and so 1 block at a time for each SM. Is it happening because of the occupancy issues and how could I definitely figure out the reason behind this kind of speedup graph?

enter image description here

enter image description here

4

1 回答 1

4

加速的峰值似乎与 CPU 的执行时间有关。分析 GPU 时间,它似乎随着代理的数量线性增加。但是,CPU 时间通常也会线性增加,在 aprox 范围内有一个下降时间,在[0.6,1.6]aprox 范围内有一些峰值[2.6,3.1]

考虑到上述情况,您的 55 倍最大加速会在 [0.6,1.1] 范围内减少。因为您的 CPU 时间也会减少。因此,按CPU time / GPU time正常计算加速比,结果更小。这同样适用于范围内的第二个[2.6,3.1]

我怎样才能弄清楚这种加速图背后的原因?我猜 CPU 被一些外部事件(I/O、CPU 中运行的其他程序、操作系统……)中断了。

为了更准确地计算加速比,将实验重复 10 次作为单独的执行,即不要在主函数中创建循环来执行 10 次。通过 10、20、30 甚至更多的单独执行,您可以计算平均时间以及方差。然后,研究执行时间:一个或两个峰值可能被视为特殊情况(忽略它们)。如果您看到趋势,则应该进行更深入的研究。

于 2013-04-12T07:52:11.423 回答