I currently trying to find a implementation that allows for consecutive function calls, where in each call access to each element of a big matrix (up to 1.5e9 entries of doubles) is needed.
I used the bigmemory
package for handling the matrix together with Rcpp
for the function operations.
To be a bit more explicit, see the following code.
C++ code:
// [[Rcpp::export]]
double IterateBigMatrix2(SEXP pBigMat,int n_row, int n_col){
XPtr<BigMatrix> xpMat(pBigMat);
MatrixAccessor<double> mat(*xpMat);
double sum = 0;
for(int i=0;i<n_row;i++){
for(int j=0;j<n_col;j++){
sum += mat[j][i];
}
}
return(sum);
}
Function call in R:
#Set up big.matrix
nrows <- 2e7
ncols <- 50
bkFile <- "bigmat.bk"
descFile <- "bigmatk.desc"
bigmat <- filebacked.big.matrix(nrow=nrows, ncol=ncols, type="double",
backingfile=bkFile, backingpath=".",
descriptorfile=descFile,
dimnames=c(NULL,NULL))
#Consecutive function calls
IterateBigMatrix2(bigmat@address,nrows,ncols)
IterateBigMatrix2(bigmat@address,nrows,ncols)
Unfortunately, the consecutive function call slows down extremely at some point for increasing n_rows resp. n_cols.
My question:
Is this because the access to big.matrix elements leads to deletion of the first cached elements if RAM is exceeded, but in consecutive function calls exactly these 'first' elements of the big.matrix are needed? If 'yes', is there some better (improving performance) way of accessing the element in the loops or deletion of cached elements?
Thank you very much for any help!