r - R中更快的模或相等检查（或矢量化的好方法）

Question

我一直在通过 Project Euler 尝试编写计算效率高的程序。考虑问题 1：http ://projecteuler.net/problem=1 。我已将范围从 1000 提高到 10,000,000 以突出效率低下。

这是我的解决方案：

system.time({
    x <- 1:1E7
    a <- sum(as.numeric(x[x%%3 ==0 | x%%5==0]))
})
 user  system elapsed 
0.980   0.041   1.011

这是一个朋友编写的一些 C++ 代码来做同样的事情。

#include <iostream>
using namespace std;

int main(int argc, char** argv)
{
 long x = 0;
 for (int i = 1; i < 10000000; i++)
 {
   if (i % 3 == 0)
     x += i;
   else if (i % 5 == 0)
     x += i;
 }
 cout << x;
 return 0;
}
cbaden$ time ./a.out
23333331666668
real    0m0.044s
user    0m0.042s
sys     0m0.001s

我知道 C++ 应该比 R 更快，但这要快得多吗？Rprof 表明我将近 60% 的时间花在模运算符上，13% 的时间花在 "==" 操作上。有没有更快的矢量化方法？

第二个问题是我将耗尽内存——随着范围变大，这种方法的可扩展性不是很高。有没有一种很好的方法可以保持矢量化，但又不尝试将子集保留在内存中？

score 7 · Accepted Answer

integer对s 而不是s进行模运算时，取模更快numeric：

f1 <- function() {
   x <- 1:1E7
   a <- sum(as.numeric(x[x%%3 ==0 | x%%5==0]))
}

f2 <- function() {
   x <- 1:1E7
   a <- sum(as.numeric(x[x %% 3L == 0L | x %% 5L == 0L]))
}

library(rbenchmark)
benchmark(f1(), f2(), replications = 5)
#   test replications elapsed relative user.self sys.self user.child sys.child
# 1 f1()            5   14.78 4.976431     13.95     0.67         NA        NA
# 2 f2()            5    2.97 1.000000      2.37     0.50         NA        NA

这距离 C++ 的性能还很远，但这是朝着正确方向迈出的一步。

score 4 · Accepted Answer

更快的解决方案

x <-1E7
a<-x%/%3
b<-x%/%5
c<-x%/%15
ans<-3*a*(a+1)/2+5*b*(b+1)/2-15*c*(c+1)/2

关于模数并没有真正帮助

score 2 · Accepted Answer

[在 OP 上] 略有改进

system.time({
  x_3 <- seq(3, 1E7, by = 3)
  x_5 <- seq(5, 1E7, by = 5)
  x_3_5 <- unique(c(x_3, x_5))
  a <- sum(as.numeric(x_3_5))}
 )
##  user  system elapsed 
##  1.53    0.13    1.66

编辑用于profr分析代码并替换seq为unique内部泛型/默认方法。

new2 <-  function(){
  x_3 <- seq.int(3, 1E7, by = 3)
  x_5 <- seq.int(5, 1E7, by = 5)
  x_3_5 <- unique.default(c(x_3, x_5))
  a <- sum(as.numeric(x_3_5))
  }

system.time(new2())
##   user  system elapsed 
##   1.11    0.04    1.16

为了比较（我的慢机器）：

system.time({
    x <- 1:1E7
    a <- sum(as.numeric(x[x %% 3 == 0 | x %% 5 == 0]))
})

## user  system elapsed 
## 4.47    0.18    4.64

基准测试

orig <- function(){
  x <- 1:1E7
  a <- sum(as.numeric(x[x %% 3 == 0 | x %% 5 == 0]))
}

new <-  function(){
  x_3 <- seq(3, 1E7, by = 3)
  x_5 <- seq(5,1 E7, by = 5)
  x_3_5 <- unique(c(x_3, x_5))
  a <- sum(as.numeric(x_3_5))
}

benchmark(orig(), new(), new2(), replications = 5)
##     test replications elapsed relative 
## 2  new()            5    7.67 1.198438      
## 3 new2()            5    6.40 1.000000     
## 1 orig()            5   22.01 3.439063

r - R中更快的模或相等检查（或矢量化的好方法）

3 回答 3

Related

Reference