1

I currently trying to find a implementation that allows for consecutive function calls, where in each call access to each element of a big matrix (up to 1.5e9 entries of doubles) is needed.

I used the bigmemory package for handling the matrix together with Rcpp for the function operations.

To be a bit more explicit, see the following code.

C++ code:

// [[Rcpp::export]]
double IterateBigMatrix2(SEXP pBigMat,int n_row, int n_col){
  XPtr<BigMatrix> xpMat(pBigMat);
  MatrixAccessor<double> mat(*xpMat);
  double sum = 0;
  for(int i=0;i<n_row;i++){
    for(int j=0;j<n_col;j++){
      sum += mat[j][i];
    }
  }
  return(sum);
}

Function call in R:

#Set up big.matrix
nrows <- 2e7
ncols <- 50
bkFile <- "bigmat.bk"
descFile <- "bigmatk.desc"
bigmat <- filebacked.big.matrix(nrow=nrows, ncol=ncols, type="double",
                                backingfile=bkFile, backingpath=".",
                                descriptorfile=descFile,
                                dimnames=c(NULL,NULL))
#Consecutive function calls
IterateBigMatrix2(bigmat@address,nrows,ncols)
IterateBigMatrix2(bigmat@address,nrows,ncols)

Unfortunately, the consecutive function call slows down extremely at some point for increasing n_rows resp. n_cols.

My question:

Is this because the access to big.matrix elements leads to deletion of the first cached elements if RAM is exceeded, but in consecutive function calls exactly these 'first' elements of the big.matrix are needed? If 'yes', is there some better (improving performance) way of accessing the element in the loops or deletion of cached elements?

Thank you very much for any help!

4

1 回答 1

0

Big.matrix 对象,作为标准 R 矩阵,按列存储。这意味着矩阵实际上是一个长向量(由相互连接的列组成)。

这基本上告诉您的是始终按列访问列,而不是逐行访问,以便访问连续内存中的数据(“访问局部性”)。

因此,只需切换两个循环,就可以了

PS:你不需要通过n_rowand n_col。您可以通过xpMat->nrow()andxpMat->ncol()mat.nrow()and获取它们mat.ncol()

于 2017-11-09T16:49:06.763 回答