我正在尝试为 Nvidia 的新 cuSolver 库中提供的一些操作制作一个受 scikits-cuda 库启发的 pycuda 包装器。我想通过 LU 分解求解 AX=B 形式的线性系统,首先使用 scikits-cuda 中的cublasSgetrfBatched方法执行该方法,这给了我分解 LU;然后通过该因式分解,我想使用我想要包装的cuSolve中的 cusolverDnSgetrs 来求解系统,当我执行计算返回状态 3 时,假设给我答案的矩阵不会改变,但 *devInfo 为零,查看 cusolver 的文档说:
CUSOLVER_STATUS_INVALID_VALUE=向函数传递了一个不支持的值或参数(例如,一个负向量大小)。
libcusolver.cusolverDnSgetrs.restype=int
libcusolver.cusolverDnSgetrs.argtypes=[_types.handle,
ctypes.c_char,
ctypes.c_int,
ctypes.c_int,
ctypes.c_void_p,
ctypes.c_int,
ctypes.c_void_p,
ctypes.c_void_p,
ctypes.c_int,
ctypes.c_void_p]
"""
handle is the handle pointer given by calling cusolverDnCreate() from cuSolver
LU is the LU factoriced matrix given by cublasSgetrfBatched() from scikits
P is the pivots matrix given by cublasSgetrfBatched()
B is the right hand matix from AX=B
"""
def cusolverSolveLU(handle,LU,P,B):
rows_LU ,cols_LU = LU.shape
rows_B, cols_B = B.shape
B_gpu = gpuarray.to_gpu(B.astype('float32'))
info_gpu = gpuarray.zeros(1, np.int32)
status=libcusolver.cusolverDnSgetrs(
handle, 'n', rows_LU, cols_B,
int(LU.gpudata), cols_LU,
int(P.gpudata), int(B_gpu.gpudata),
cols_B, int(info_gpu.gpudata))
print info_gpu
print status
handle= cusolverCreate() #get the initialization of cusolver
LU, P = cublasLUFactorization(...)
B = np.asarray(np.random.rand(3, 3), np.float32)
cusolverSolveLU(handle,LU,P,B)
输出:
[0]
3
我做错了什么?