python - CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE in Python

Question

I'm having this error when trying to run this code in Python using CUDA. I'm following this tutorial but i'm trying it in Windows 7 x64 machine.

https://www.youtube.com/watch?v=jKV1m8APttU

In fact, I run check_cuda() and all tests passed. Can anyone help me what is the exact issue here.

My Code:

import numpy as np
from timeit import default_timer as timer
from numbapro import vectorize, cuda

@vectorize(['float64(float64, float64)'], target='gpu')
def VectorAdd(a, b):
    return a + b

def main():
    N = 32000000

A = np.ones(N, dtype=np.float64)
B = np.ones(N, dtype=np.float64)
C = np.zeros(N, dtype=np.float64)

start = timer()
C = VectorAdd(A, B)
vectoradd_time = timer() - start

print("C[:5] = " + str(C[:5]))
print("C[-5:] = " + str(C[-5:]))

print("VectorAdd took %f seconds" % vectoradd_time)

if __name__ == '__main__':
    main()

Error Message:

---------------------------------------------------------------------------
CudaAPIError                              Traceback (most recent call last)
<ipython-input-18-2436fc2ab63a> in <module>()
      1 if __name__ == '__main__':
----> 2     main()

<ipython-input-17-64de53fdbe77> in main()
      7 
      8     start = timer()
----> 9     C = VectorAdd(A, B)
     10     vectoradd_time = timer() - start
     11 

C:\Anaconda2\lib\site-packages\numba\cuda\dispatcher.pyc in __call__(self, *args, **kws)
     93                       the input arguments.
     94         """
---> 95         return CUDAUFuncMechanism.call(self.functions, args, kws)
     96 
     97     def reduce(self, arg, stream=0):

C:\Anaconda2\lib\site-packages\numba\npyufunc\deviceufunc.pyc in call(cls, typemap, args, kws)
    297 
    298             devarys.extend([devout])
--> 299             cr.launch(func, shape[0], stream, devarys)
    300 
    301             if any_device:

C:\Anaconda2\lib\site-packages\numba\cuda\dispatcher.pyc in launch(self, func, count, stream, args)
    202 
    203     def launch(self, func, count, stream, args):
--> 204         func.forall(count, stream=stream)(*args)
    205 
    206     def is_device_array(self, obj):

C:\Anaconda2\lib\site-packages\numba\cuda\compiler.pyc in __call__(self, *args)
    193 
    194         return kernel.configure(blkct, tpb, stream=self.stream,
--> 195                                 sharedmem=self.sharedmem)(*args)
    196 
    197 class CUDAKernelBase(object):

C:\Anaconda2\lib\site-packages\numba\cuda\compiler.pyc in __call__(self, *args, **kwargs)
    357                           blockdim=self.blockdim,
    358                           stream=self.stream,
--> 359                           sharedmem=self.sharedmem)
    360 
    361     def bind(self):

C:\Anaconda2\lib\site-packages\numba\cuda\compiler.pyc in _kernel_call(self, args, griddim, blockdim, stream, sharedmem)
    431                                    sharedmem=sharedmem)
    432         # Invoke kernel
--> 433         cu_func(*kernelargs)
    434 
    435         if self.debug:

C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in __call__(self, *args)
   1114 
   1115         launch_kernel(self.handle, self.griddim, self.blockdim,
-> 1116                       self.sharedmem, streamhandle, args)
   1117 
   1118     @property

C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in launch_kernel(cufunc_handle, griddim, blockdim, sharedmem, hstream, args)
   1158                           hstream,
   1159                           params,
-> 1160                           None)
   1161 
   1162 

C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in safe_cuda_api_call(*args)
    220         def safe_cuda_api_call(*args):
    221             retcode = libfn(*args)
--> 222             self._check_error(fname, retcode)
    223 
    224         setattr(self, fname, safe_cuda_api_call)

C:\Anaconda2\lib\site-packages\numba\cuda\cudadrv\driver.pyc in _check_error(self, fname, retcode)
    250             errname = ERROR_MAP.get(retcode, "UNKNOWN_CUDA_ERROR")
    251             msg = "Call to %s results in %s" % (fname, errname)
--> 252             raise CudaAPIError(retcode, msg)
    253 
    254     def get_device(self, devnum=0):

CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE

score 2 · Accepted Answer

我通过 NVIDIA 开发者论坛找到了解决问题的方法。如果您想了解有关解决方案的更多信息，请查看此链接。

https://devtalk.nvidia.com/default/topic/962843/cuda-programming-and-performance/cudaapierror-1-call-to-culaunchkernel-results-in-cuda_error_invalid_value-in-python/?offset=3#4968130

简而言之：

当我更改 N = 32000 或任何其他更小的数量时，它确实工作得很好。
事实上，这意味着我没有以正确的 GPU 类型编译它（check_cuda 是验证它的函数调用）。

希望我的回答对某人有所帮助。

score 0 · Accepted Answer

这可能意味着，您尝试在一个块中运行更多线程，因为它实际上是允许的。对我来说就是这样。所以试着把你的执行分成几块。

python - CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE in Python

2 回答 2

Related

Reference