0

I'm trying to speed up code that takes time series data and limits it to a maximum value and then stretches it forward until sum of original data and the "stretched" data are the same.

I have a more complicated version of this that is taking 6 hours to run on 100k rows. I don't think this is vectorizable because it uses values calculated on prior rows - is that correct?

x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
dat <- data.frame(x=x,y=rep(0,length(x)))
remainder <- 0
upperlimit <- 2000
for(i in 1:length(dat$x)){
  if(dat$x[i] >= upperlimit){
    dat$y[i]  <- upperlimit
  } else {
    dat$y[i] <- min(remainder,upperlimit)
  }
  remainder  <-  remainder + dat$x[i] - dat$y[i]
}
dat

I understand you can use ifelse but I don't think cumsum can be used to carry forward the remainder - apply doesn't help either as far as I know. Do I need to resort to Rcpp? Thank you greatly.

4

1 回答 1

0

我继续并在其中实现了这一点,Rcpp并对R功能进行了一些调整:

require(Rcpp);require(microbenchmark);require(ggplot2);

limitstretchR <- function(upperlimit,original) {
  remainder  <- 0
  out <- vector(length=length(original))
  for(i in 1:length(original)){
    if(original[i] >= upperlimit){
      out[i]  <- upperlimit
    } else {
      out[i] <- min(remainder,upperlimit)
    }
    remainder  <-  remainder + original[i] - out[i]
  }
  out
}

Rcpp功能:

cppFunction('
  NumericVector limitstretchC(double upperlimit, NumericVector original) {
    int n = original.size();
    double remainder = 0.0;
    NumericVector out(n);
    for(int i = 0; i < n; ++i) {
        if (original[i] >= upperlimit) {
          out[i] = upperlimit;
        } else {
          out[i] = std::min<double>(remainder,upperlimit);
        }
      remainder = remainder + original[i] - out[i];
    }
    return out;
  }
')

测试它们:

x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
original <- rep(x,20000)
upperlimit <- 2000
system.time(limitstretchR(upperlimit,original))
system.time(limitstretchC(upperlimit,original))

这分别产生了 80.655 和 0.001 秒。本机R对此非常不利。但是,我运行了一个microbenchmark(使用较小的向量)并得到了一些令人困惑的结果。

res <- microbenchmark(list=
    list(limitstretchR=limitstretchR(upperlimit,rep(x,10000)),
    limitstretchC=limitstretchC(upperlimit,rep(x,10000))),
        times=110,
        control=list(order="random",warmup=10))
print(qplot(y=time, data=res, colour=expr) + scale_y_log10())
boxplot(res)
print(res)

如果您要运行它,您会看到两个函数的结果几乎相同。这是我第一次使用microbenchmark,有什么建议吗?

于 2013-09-18T18:46:33.100 回答