您可以编写一个更快的 R 函数:
longest_rollsum_R <- function(x, threshold) {
R_rollsum <- function(k){
for(i in seq(length(x) - k))
if(sum(x[i:(i+k)]) < threshold) return(FALSE)
TRUE
}
for (i in 1:length(x)) if(R_rollsum(i)) return(i)
}
现在与上面的比较,
Unit: milliseconds
expr min lq mean median uq max neval
rollsum(df) 23.440017 25.407785 29.007619 27.906883 31.014434 45.516888 100
rollsum_rcpp(df) 4.046400 4.499688 5.253406 4.734718 5.596438 14.079618 100
rollsum_R1(df) 3.798058 4.194639 5.100568 4.710468 5.280267 14.749520 100
变化似乎不大。但是当我们改变阈值时它是相当大的:
Unit: milliseconds
expr min lq mean median uq max neval
rollsum(df, 20) 111.336885 130.055676 142.683567 138.11347 147.231358 306.06438 100
rollsum_rcpp(df, 20) 11.640328 13.170309 15.166030 14.03039 16.060333 31.23998 100
rollsum_R1(df, 20) 5.993384 7.128607 8.125868 7.54488 8.140206 19.86842 100
您还可以在 Rcpp 中编写自己的代码来解决短路问题,这比目前给出的两种方法更快地完成工作。
将以下内容保存在您的工作目录中longest_rollsum.cpp
:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int longest_rollsum_C(NumericVector x, double threshold){
auto rollsum_t = [&x, threshold](int k) {
for (int i = 0; i< x.length() - k; i++)
if(sum(x[seq(i, i+k)]) < threshold) return false;
return true;
};
for (int i = 0; i<x.length(); i++) if(rollsum_t(i)) return i;
return 0;
}
在 R 中:获取上述文件
Rcpp::sourceCpp("longest_rollsum.cpp")
rollsum_R <- function(df){
df %>%
group_by(g) %>%
summarise(longest = longest_rollsum_C(x, 2))
}
microbenchmark::microbenchmark(rollsum(df), rollsum_rcpp(df), rollsum_R(df))
Unit: milliseconds
expr min lq mean median uq max neval
rollsum(df) 24.052665 25.018864 26.985276 25.453187 27.479305 37.49629 100
rollsum_rcpp(df) 4.077397 4.352724 4.755942 4.572804 4.902468 13.10230 100
rollsum_R(df) 2.271907 2.529000 2.871154 2.714801 2.955849 10.62107 100