cuda - 是否可以从全局或设备函数调用 CUDA CUBLAS 函数

Question

我正在尝试并行化现有应用程序，我将大部分应用程序并行化并在 GPU 上运行，我在将一个功能迁移到 GPU 时遇到问题

该函数使用了一个函数 dtrsv，它是 blas 库的一部分，见下文。

void dtrsv_call_N(double* B, double* A, int* n, int* lda, int* incx) {
  F77_CALL(dtrsv)("L","T","N", n, B, lda, A, incx);
}

我已经能够按照下面调用等效的 cuda/cublas 函数，并且产生的结果等效于 fortran dtrsv 子例程。

status = cublasDtrsv(handle,CUBLAS_FILL_MODE_LOWER,CUBLAS_OP_T,CUBLAS_DIAG_NON_UNIT, x, dev_m1, x, dev_m2, c);

if (status != CUBLAS_STATUS_SUCCESS) {
        printf ( "!!!! kernel execution error.\n");
        return EXIT_FAILURE;
    }

我的问题是我需要能够从设备或全局函数调用 cublasDtrsv，如下所示，

__global__ void Dtrsv__cm2(cublasHandle_t handle,cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const double *A, int lda, double *x, int incx){
    cublasDtrsv(handle,uplo,trans,diag, n, A, lda, x, incx);
}

在 cuda 4.0 中，如果我尝试编译以下内容，则会出现以下错误，有谁知道是否有一种方法可以从 a __device__or函数调用 cublas__global__函数？

错误：不允许function("cublasDtrsv_v2")从__device__/调用主机__global__ function("Dtrsv__dev")

score 5 · Accepted Answer

CUDA Toolkit 5.0 introduced a device linker that can link device object files compiled separately. I believe, CUBLAS functions from CUDA Toolkit 5.0 can now be called from device functions (but I only reviewed the headers, I have no experience using CUBLAS).

cuda - 是否可以从全局或设备函数调用 CUDA CUBLAS 函数

1 回答 1

Related

Reference