python - 使用 pycuda.driver.Event 测量时间会给出错误的结果

Question

我从PyCuda示例中运行 SimpleSpeedTest.py，产生以下输出：

Using nbr_values == 8192
Calculating 100000 iterations
SourceModule time and first three results:
0.058294s, [ 0.005477  0.005477  0.005477]
Elementwise time and first three results:
0.102527s, [ 0.005477  0.005477  0.005477]
Elementwise Python looping time and first three results:
2.398071s, [ 0.005477  0.005477  0.005477]
GPUArray time and first three results:
8.207257s, [ 0.005477  0.005477  0.005477]
CPU time measured using :
0.000002s, [ 0.005477  0.005477  0.005477]

前四个时间测量值是合理的，但最后一个（0.000002s）距离很远。CPU 结果应该是最慢的，但它比最快的 GPU 方法快几个数量级。所以显然测量的时间一定是错误的。这很奇怪，因为对于前四个结果，相同的计时方法似乎工作得很好。

所以我从 SimpleSpeedTest.py 中获取了一些代码，并制作了一个小测试文件[2]，它产生了：

time measured using option 1:
0.000002s 
time measured using option 2:
5.989620s

选项 1pycuda.driver.Event.record()使用（如在 SimpleSpeedTest.py 中）测量持续时间，选项 2使用time.clock(). 同样，选项 1 关闭，而选项 2 给出了合理的结果（运行测试文件所需的时间约为 6 秒）。

有谁知道为什么会这样？

由于 SimpleSpeedTest.py 支持使用选项 1，是否是我的设置导致了问题？我正在运行 GTX 470、显示驱动程序 301.42、CUDA 4.2、Python 2.7 64、PyCuda 2012.1、X5650 Xeon

[2]测试文件：

import numpy
import time
import pycuda.driver as drv
import pycuda.autoinit

n_iter = 100000
nbr_values = 8192 # = 64 * 128 (values as used in SimpleSpeedTest.py)

start = drv.Event() # option 1 uses pycuda.driver.Event
end = drv.Event()

a = numpy.ones(nbr_values).astype(numpy.float32) # test data

start.record() # start option 1 (inserting recording points into GPU stream)
tic = time.clock() # start option 2 (using CPU time)

for i in range(n_iter):
    a = numpy.sin(a) # do some work

end.record() # end option 1
toc = time.clock() # end option 2

end.synchronize() 

events_secs = start.time_till(end)*1e-3
time_secs = toc - tic 

print "time measured using option 1:"
print "%fs " % events_secs
print "time measured using option 2:"
print "%fs " % time_secs

score -1 · Accepted Answer

I contacted Andreas Klöckner and he suggested to synchronize on the start event, too.

...
start.record()
start.synchronize()
...

And this seems to solve the issue!

time measured using option 1:
5.944461s
time measured using option 2:
5.944314s

Apparently CUDA's behaviour changed in the last two years. I updated SimpleSpeedTest.py.

python - 使用 pycuda.driver.Event 测量时间会给出错误的结果

1 回答 1

Related

Reference