我尝试将 cuda python 与 numba 一起使用。代码是如下计算一维数组的总和,但我不知道如何得到一个值结果而不是三个值。
python3.5 与 numba + CUDA8.0
import os,sys,time
import pandas as pd
import numpy as np
from numba import cuda, float32
os.environ['NUMBAPRO_NVVM']=r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\bin\nvvm64_31_0.dll'
os.environ['NUMBAPRO_LIBDEVICE']=r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\libdevice'
bpg = (1,1)
tpb = (1,3)
@cuda.jit
def calcu_sum(D,T):
ty = cuda.threadIdx.y
bh = cuda.blockDim.y
index_i = ty
L = len(D)
su = 0
while index_i<L:
su +=D[index_i]
index_i +=bh
print('su:',su)
T[0,0]=su
print('T:',T[0,0])
D = np.array([ 0.42487645,0.41607881,0.42027071,0.43751907,0.43512794,0.43656972,
0.43940639,0.43864551,0.43447691,0.43120232], dtype=np.float32)
T = np.empty([1,1])
print('D: ',D)
stream = cuda.stream()
with stream.auto_synchronize():
dD = cuda.to_device(D, stream)
dT= cuda.to_device(TE, stream)
calcu_sum[bpg, tpb, stream](dD,dT)
输出是:
D: [ 0.42487645 0.41607881 0.42027071 0.43751907 0.43512794 0.43656972
0.43940639 0.43864551 0.43447691 0.43120232]
su: 1.733004
su: 1.289852
su: 1.291317
T: 1.733004
T: 1.289852
T: 1.291317
为什么我不能得到输出 "4.31417383" 而不是 "1.733004 1.289852 1.291317" ?1.733004+1.289852+1.291317=4.314173。
我是 numba 的新手,阅读 numba 文档,但不知道该怎么做。有人可以给建议吗?