c - calling CUDA compiled .dll from R

Question

I am using Windows 7 platform.

I describe below step-by-step all the routines that I perform to get the .dll file (PASS), dyn.load it in R (PASS) and evoking .Call function in R (FAIL).

When evoking .Call I get:

> out<- .Call("rowAND", as.integer(t(m)), nrow(m), ncol(m))
**Error in .Call("rowAND", as.integer(t(m)), nrow(m), ncol(m)) : 
  C symbol name "rowAND" not in load table**

1) Below the source code:

#include <stdio.h>
#include <math.h>

#include <cuda_runtime.h>
#include <cuda.h>
#include <device_launch_parameters.h>

#include <R.h>
#include <Rdefines.h>

#include "cuPrintf.cuh"
#include "cuPrintf.cu"

#include "cuRow.h"
#include "cuError.h"

extern "C" { 
SEXP rowAND(SEXP x, SEXP r_nrow, SEXP r_ncol) {
    // input: 
    //              x=as.integer(t(m)), vector of integer values from R (t(m) because store values by col)
    //              r_nrow=nrow(m), scalar
    //              r_ncol=ncol(m), scalar

    //x = coerceVector(x, INTSXP); // force coercion to a matrix of real values

    // define deimension
    int nrow = asInteger(r_nrow);
    int ncol = asInteger(r_ncol);
    size_t m_size;
    size_t calc_size;
    m_size = nrow * ncol * sizeof(int); // m (input)
    calc_size = nrow * sizeof(int); // change to nrow/ncol depending on calculation (output)

    // R
    SEXP r;
    PROTECT(r = allocMatrix(INTSXP,nrow,1));

    // cuda error variable
    cudaError_t err;

    // allocate HOST 
    int *h_m = INTEGER(x);
    int *h_calc = INTEGER(r);

    // allocate DEVICE
    int *d_m = NULL, *d_calc = NULL;
    err = cudaMalloc((void **)&d_m, m_size); checkError(err);
    err = cudaMalloc((void **)&d_calc, calc_size); checkError(err);

    // copy host matrix to device
    err = cudaMemcpy(d_m, h_m, m_size, cudaMemcpyHostToDevice); checkError(err);

    // Initialize cuPrintf -- DEBUGGING
    cudaPrintfInit();

    dim3 numBlocks(nrow,1,1); // blocks
    dim3 threadsPerBlock(1,1,1); // 1 thread per block
    rowOR<<<numBlocks, threadsPerBlock,0,0>>>(d_m, d_calc, ncol); // main call

    // Terminate cuPrintf -- DEBUGGING
    cudaPrintfDisplay (stdout, true);
    cudaPrintfEnd ();

    err = cudaGetLastError(); checkError(err);

    // Copy the device result vector in device memory to the host result vector
    err = cudaMemcpy(h_calc, d_calc, calc_size, cudaMemcpyDeviceToHost); checkError(err);

    // Free device global memory
    err = cudaFree(d_m); checkError(err);
    err = cudaFree(d_calc); checkError(err);

    // Reset the device
    err = cudaDeviceReset();

    UNPROTECT(1);
    return r;
}

2) I compile .cu file, using nvcc which generates the object (.obj). Thus, I link the libraries (PASS), no problem here, and it generates .dll file.

3) when I load the .dll using the R command: dyn.load IT PASS. The loaded .dll appears in getLoadedDLLs():

> getLoadedDLLs()
                                                                                  Filename Dynamic.Lookup
base                                                                                  base          FALSE
methods       C:/Revolution/R-Community-6.2/R-2.15.3/library/methods/libs/i386/methods.dll          FALSE
Revobase    C:/Revolution/R-Community-6.2/R-2.15.3/library/Revobase/libs/i386/Revobase.dll           TRUE
tools             C:/Revolution/R-Community-6.2/R-2.15.3/library/tools/libs/i386/tools.dll          FALSE
grDevices C:/Revolution/R-Community-6.2/R-2.15.3/library/grDevices/libs/i386/grDevices.dll          FALSE
stats             C:/Revolution/R-Community-6.2/R-2.15.3/library/stats/libs/i386/stats.dll          FALSE
cuRow           C:/Users/msn/Documents/Visual Studio 2010/Projects/R_C/R_C/Debug/cuRow.dll           TRUE

4) HERE COMES THE PROBLEM: When I check if the function rowAND is loaded I get FALSE:

> is.loaded("rowAND")
[1] FALSE
>

Thus, obviously, it fails when I run .Call (because it is not loaded):

> path.dll<-'C:/Users/msn/Documents/Visual Studio 2010/Projects/R_C/R_C/Debug'
> dyn.load(file.path(path.dll,paste0("cuRow", .Platform$dynlib.ext)))
> nrow<-10
> ncol<-3
> m<-matrix(sample(c(0,1),nrow*ncol,replace=TRUE),nrow,ncol)
> out<- .Call("rowAND", as.integer(t(m)), nrow(m), ncol(m))
Error in .Call("rowAND", as.integer(t(m)), nrow(m), ncol(m)) : 
  C symbol name "rowAND" not in load table

I see that the function appears to be correctly defined in the source code, but it can't be "seen" in the loaded library.

What I am missing here? Thanks in advance!

EDIT:

Based on @Dirk partial answer, will try to write a CUDA dll project which will be called by C. Thus, I can compile the target C source using standard R CMD SHLIB.

like: C (dll), deployed to R which calls CUDA dll inside.

will update when done!

EDIT 2:

I answered my own question below. I finally could get CUDA implementation in R (WINDOWS platform)

score 4 · Accepted Answer

I decided to post an answer to my own question, for those who are experiencing the same difficulties. I can categorize the answer as a workaround to the problem.

End of day, my problem was to implement CUDA GPU parallelism in R using WINDOWS platform.

I see that the majority of CRAN packages (not to say all) implementing CUDA have NO binaries for WINDOWS platform. In other words, if you try to build from source in WINDOWS it fails. I guess they haven't been built for WINDOWS because it is trick to compile and link .cu files in WINDOWS using MinGW and nvcc compiler together.

NVidia has VS2010 as the main platform for WINDOWS development, and eclipse plug-in is only supported for Linux. Although, nvcc compiler supports -ccbin option which can make it call gcc, to configure the "toolchain" is really trick.

My workaround was to develop a DLL project in VS2010, and to compile and link the DLL using VS2010 native compiler/linker which is cl.

This dll is the piece that internally calls the CUDA GPU parallelism.

After compiled in VS2010, I loaded the dll using dyn.load() and called its functions using .C in R.

It finally worked, and end of day I could deploy CUDA GPU parallelism functionality to R in a WINDOWS platform.

I could deploy the same .dll in a package, using NAMESPACE, and provide the dll source code inside the CRAN tar ball, aiming not to infringe open source policies. Anyways, it is a workaround.

Two important factors:

1) To deploy all exported functions in native C, using extern "C".

2) To consider all input variables of the functions as pointers, since it is mandatory when using '.C' calls.

score 2 · Accepted Answer

What I would do in your case is to look very closely at the existing R package for CUDA which are publicly available on CRAN as they provide working implementations. I believe at least some of these build on Windows too.

Among the CRAN packages using CUDA are

and more. See the CRAN Task View on High-Performance Computing for more.

I am most familiar with the first (and oldest) one. I uses one layer of code to call from R to C, and then another to call from C to the CUDA-enabled code compiled with NVidia's compiler frontend. The last one uses Rcpp for the passage from R to C/C++. I suspect your error is due to trying to skip one step.

score 1 · Accepted Answer

There are several important things when compile R with CUDA by Visual Studio on Windows.

Declare the C function with __declspec(dllexport) keyword (install of extern "C" )
```
extern “C” __declspec(dllexport)
```
Build the same version with R (32- or 64-bits); Otherwise, loading DLL in R will fail by:

Load Library failure: %1 is not a valid Win32 application.

Include CUDA library in Visual Studio correctly. In general, it will be added in :

Solution Explorer → Project name

Properties → Linker → Input → Additional Dependencies

Other detail steps, you can refer NVIDIA blog and ParallelR.

c - calling CUDA compiled .dll from R

3 回答 3

Related

Reference