12

从矩阵中的每一列中提取最小值的最快方法是什么?


编辑:

将所有基准移至以下答案。

使用高、短或宽矩阵:

  ##  TEST DATA
  set.seed(1)
  matrix.inputs <- list(
        "Square Matrix"     = matrix(sample(seq(1e6), 4^2*1e4, T), ncol=400),   #  400 x  400
        "Tall Matrix"       = matrix(sample(seq(1e6), 4^2*1e4, T), nrow=4000),  # 4000 x   40
        "Wide-short Matrix" = matrix(sample(seq(1e6), 4^2*1e4, T), ncol=4000),  #   40 x 4000
        "Wide-tall Matrix"  = matrix(sample(seq(1e6), 4^2*1e5, T), ncol=4000),   #  400 x 4000
        "Tiny Sq Matrix"    = matrix(sample(seq(1e6), 4^2*1e2, T), ncol=40)     #   40 x   40
  )
4

6 回答 6

11

sos软件包非常适合回答此类问题。

library("sos")
findFn("colMins")
library("matrixStats")
?colMins

http://finzi.psych.upenn.edu/R/library/matrixStats/html/rowRanges.html

奇怪的是,对于我尝试的一个示例来说,colMins速度较慢。也许有人可以指出我的例子有什么好笑的?

set.seed(101); z <- matrix(runif(1e6),nrow=1000)
library(rbenchmark)
benchmark(colMins(z),apply(z,2,min))
##               test replications elapsed relative user.self sys.self
## 2 apply(z, 2, min)          100  14.290     1.00     7.216    7.057
## 1       colMins(z)          100  25.585     1.79    15.509    9.852
于 2012-12-03T03:53:31.900 回答
9

这是一种在方形和宽矩阵上速度更快的方法。它用于pmin矩阵的行。(如果您知道将矩阵拆分为行的更快方法,请随时编辑)

do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))

使用与@RicardoSaporta 相同的基准:

$`Square Matrix`
          test elapsed relative
3 pmin.on.rows   1.370    1.000
1          apl   1.455    1.062
2         cmin   2.075    1.515

$`Wide Matrix`
      test elapsed relative
3 pmin.on.rows   0.926    1.000
2         cmin   2.302    2.486
1          apl   5.058    5.462

$`Tall Matrix`
          test elapsed relative
1          apl   1.175    1.000
2         cmin   2.126    1.809
3 pmin.on.rows   5.813    4.947
于 2012-12-03T12:13:59.050 回答
6

2014 年 12 月 17 日更新

colMins()等。在最近版本的matrixStats中显着加快了速度。这是使用 matrixStats 0.12.2 更新的基准测试摘要,显示它(“cmin”)比第二快的方法快约 5-20 倍:

$`Square Matrix`
     test elapsed relative
2    cmin   0.216    1.000
1     apl   4.200   19.444
5 pmn.int   4.604   21.315
4     pmn   5.136   23.778
3    lapl  12.546   58.083

$`Tall Matrix`
     test elapsed relative
2    cmin   0.262    1.000
1     apl   3.006   11.473
5 pmn.int  18.605   71.011
3    lapl  22.798   87.015
4     pmn  27.583  105.279

$`Wide-short Matrix`
     test elapsed relative
2    cmin   0.346    1.000
5 pmn.int   3.766   10.884
4     pmn   3.955   11.431
3    lapl  13.393   38.708
1     apl  19.187   55.454

$`Wide-tall Matrix`
     test elapsed relative
2    cmin   5.591    1.000
5 pmn.int  39.466    7.059
4     pmn  40.265    7.202
1     apl  67.151   12.011
3    lapl 158.035   28.266

$`Tiny Sq Matrix`
     test elapsed relative
2    cmin   0.011    1.000
5 pmn.int   0.135   12.273
4     pmn   0.178   16.182
1     apl   0.202   18.364
3    lapl   0.269   24.455

先前评论 2013-10-09
仅供参考,因为 matrixStats v0.8.7 (2013-07-28),colMins()速度大约是以前的两倍。原因是先前使用的函数colRanges(),这解释了此处报告的“令人惊讶的缓慢”结果。和colMaxs()的速度相同。rowMins()rowMaxs()

于 2013-10-09T03:45:16.927 回答
3
lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)

which( ! apply(my.mat, 2, min, na.rm=T) ==
        sapply( split(my.mat, rep(1:dim(my.mat)[1], each=dim(my.mat)[2])), min) )
# named integer(0)
于 2012-12-03T04:35:11.070 回答
2

以下是迄今为止的答案集合。随着更多答案的贡献,这将被更新。

基准

  library(rbenchmark)
  library(matrixStats)  # for colMins


  list.of.tests <- list (
        ## Method 1: apply()  [original]
        apl =expression(apply(mat, 2, min, na.rm=T)),

        ## Method 2:  matrixStats::colMins [contributed by @Ben Bolker ]
        cmin = expression(colMins(mat)),

        ## Method 3: lapply() + split()  [contributed by @DWin ]
        lapl = expression(lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)),

        ## Method 4: pmin() / pmin.int()  [contributed by @flodel ]
        pmn = expression(do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))),
        pmn.int = expression(do.call(pmin.int, lapply(1:nrow(mat), function(i)mat[i,]))) #,

        ## Method 5: ????
        #  e5 = expression(  ???  ),
        )  


  (times <- 
        lapply(matrix.inputs, function(mat)
            do.call(benchmark, args=c(list.of.tests, replications=500, order="relative"))[, c("test", "elapsed", "relative")]
  ))



  ############################# 
  #$         RESULTS         $#
  #$_________________________$#
  #############################

  # $`Square Matrix`
  #      test elapsed relative
  # 5 pmn.int   2.842    1.000
  # 4     pmn   3.622    1.274
  # 1     apl   3.670    1.291
  # 2    cmin   5.826    2.050
  # 3    lapl  41.817   14.714  

  # $`Tall Matrix`
  #      test elapsed relative
  # 1     apl   2.622    1.000
  # 2    cmin   5.561    2.121
  # 5 pmn.int  11.264    4.296
  # 4     pmn  18.142    6.919
  # 3    lapl  48.637   18.550  

  # $`Wide-short Matrix`
  #      test elapsed relative
  # 5 pmn.int   2.909    1.000
  # 4     pmn   3.018    1.037
  # 2    cmin   6.361    2.187
  # 1     apl  15.765    5.419
  # 3    lapl  41.479   14.259  

  # $`Wide-tall Matrix`
  #      test elapsed relative
  # 5 pmn.int  20.917    1.000
  # 4     pmn  26.188    1.252
  # 1     apl  38.635    1.847
  # 2    cmin  64.557    3.086
  # 3    lapl 434.761   20.785  

  # $`Tiny Sq Matrix`
  #      test elapsed relative
  # 5 pmn.int   0.112    1.000
  # 2    cmin   0.149    1.330
  # 4     pmn   0.174    1.554
  # 1     apl   0.180    1.607
  # 3    lapl   0.509    4.545
于 2012-12-04T21:27:17.417 回答
2

mat[(1:ncol(mat)-1)*nrow(mat)+max.col(t(-mat))]看起来相当快,而且它是基础 R。

于 2013-11-25T22:08:01.570 回答