21

我正在寻找一个函数,它返回一个向量的所有无序组合。例如

x <- c('red','blue','black')
uncomb(x)
[1]'red'
[2]'blue'
[3]'black'
[4]'red','blue'
[5]'blue','black'
[6]'red','black'
[7]'red','blue','black'

我猜想某些库中有一个函数可以执行此操作,但找不到它。我正在尝试permutationsgtool但它不是我正在寻找的功能。

4

3 回答 3

23

您可以应用长度x超过函数m参数的序列combn()

x <- c("red", "blue", "black")
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
# [[1]]
# [1] "red"
# 
# [[2]]
# [1] "blue"
# 
# [[3]]
# [1] "black"
# 
# [[4]]
# [1] "red"  "blue"
# 
# [[5]]
# [1] "red"   "black"
# 
# [[6]]
# [1] "blue"  "black"
# 
# [[7]]
# [1] "red"   "blue"  "black"

如果您更喜欢矩阵结果,那么您可以申请stringi::stri_list2matrix()上面的列表。

stringi::stri_list2matrix(
    do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE)),
    byrow = TRUE
)
#      [,1]    [,2]    [,3]   
# [1,] "red"   NA      NA     
# [2,] "blue"  NA      NA     
# [3,] "black" NA      NA     
# [4,] "red"   "blue"  NA     
# [5,] "red"   "black" NA     
# [6,] "blue"  "black" NA     
# [7,] "red"   "blue"  "black"
于 2015-01-14T22:29:35.703 回答
8

我从List All Combinations With combn 重定向到这里,因为这是欺骗目标之一。这是一个老问题,@RichScriven 提供的答案非常好,但我想为社区提供一些可以说更自然、更有效的选择(最后两个)。

我们首先注意到输出与Power Set非常相似。powerSet从包中调用rje,我们看到我们的输出确实匹配了幂集中的每个元素,除了第一个元素,它等效于Empty Set

x <- c("red", "blue", "black")
rje::powerSet(x)
[[1]]
character(0)   ## empty set equivalent

[[2]]
[1] "red"

[[3]]
[1] "blue"

[[4]]
[1] "red"  "blue"

[[5]]
[1] "black"

[[6]]
[1] "red"   "black"

[[7]]
[1] "blue"  "black"

[[8]]
[1] "red"   "blue"  "black"

如果您不想要第一个元素,您可以[-1]像这样轻松地将 a 添加到函数调用的末尾:rje::powerSet(x)[-1]

接下来的两个解决方案来自较新的包arrangementsRcppAlgos(我是作者),这将为用户提供极大的效率收益。这两个包都能够生成Multisets的组合。

为什么这很重要?

可以证明存在从幂集到多集选择的所有组合的一对一映射,其中是空集的表示(如零或空白)。考虑到这一点,请注意:Ac(rep(emptyElement, length(A)), A)length(A)emptyElement

## There is also a function called combinations in the
## rje package, so we fully declare the function with
## scope operator
library(arrangements)
arrangements::combinations(x = c("",x), k = 3, freq = c(2, rep(1, 3)))
     [,1]  [,2]   [,3]   
[1,] ""    ""     "red"  
[2,] ""    ""     "blue" 
[3,] ""    ""     "black"
[4,] ""    "red"  "blue" 
[5,] ""    "red"  "black"
[6,] ""    "blue" "black"
[7,] "red" "blue" "black"

library(RcppAlgos)
comboGeneral(c("",x), 3, freqs = c(2, rep(1, 3)))
     [,1]  [,2]   [,3]   
[1,] ""    ""     "red"  
[2,] ""    ""     "blue" 
[3,] ""    ""     "black"
[4,] ""    "red"  "blue" 
[5,] ""    "red"  "black"
[6,] ""    "blue" "black"
[7,] "red" "blue" "black"

如果您不喜欢处理空白元素和/或矩阵,您还可以返回一个使用lapply.

lapply(seq_along(x), comboGeneral, v = x)
[[1]]
     [,1]   
[1,] "red"  
[2,] "blue" 
[3,] "black"

[[2]]
     [,1]   [,2]   
[1,] "red"  "blue" 
[2,] "red"  "black"
[3,] "blue" "black"

[[3]]
     [,1]  [,2]   [,3]   
[1,] "red" "blue" "black"


lapply(seq_along(x), function(y) arrangements::combinations(x, y))
[[1]]
     [,1]   
[1,] "red"  
[2,] "blue" 
[3,] "black"

[[2]]
     [,1]   [,2]   
[1,] "red"  "blue" 
[2,] "red"  "black"
[3,] "blue" "black"

[[3]]
     [,1]  [,2]   [,3]   
[1,] "red" "blue" "black"

现在我们展示了最后两种方法效率更高(NB 我从@RichSciven 提供的答案中删除do.call(c,simplify = FALSE以便比较相似输出的生成。我还包括在内rje::powerSet以进行良好的衡量):

set.seed(8128)
bigX <- sort(sample(10^6, 20)) ## With this as an input, we will get 2^20 - 1 results.. i.e. 1,048,575
library(microbenchmark)
microbenchmark(powSetRje = powerSet(bigX),
               powSetRich = lapply(seq_along(bigX), combn, x = bigX),
               powSetArrange = lapply(seq_along(bigX), function(y) arrangements::combinations(x = bigX, k = y)),
               powSetAlgos = lapply(seq_along(bigX), comboGeneral, v = bigX),
               unit = "relative")

Unit: relative
          expr        min        lq      mean   median        uq      max neval
     powSetRje 64.4252454 44.063199 16.678438 18.63110 12.082214 7.317559   100
    powSetRich 61.6766640 43.027789 16.009151 17.88944 11.406994 7.222899   100
 powSetArrange  0.9508052  1.060309  1.080341  1.02257  1.262713 1.126384   100
   powSetAlgos  1.0000000  1.000000  1.000000  1.00000  1.000000 1.000000   100

更进一步,arrangements配备了一个名为的参数layout,允许用户为其输出选择特定的格式。其中之一是layout = "l"列表。它类似于设置simplify = FALSEcombn并允许我们获得类似的输出powerSet。观察:

do.call(c, lapply(seq_along(x), function(y) {
                    arrangements::combinations(x, y, layout = "l")
                  }))
[[1]]
[1] "red"

[[2]]
[1] "blue"

[[3]]
[1] "black"

[[4]]
[1] "red"  "blue"

[[5]]
[1] "red"   "black"

[[6]]
[1] "blue"  "black"

[[7]]
[1] "red"   "blue"  "black"

和基准:

microbenchmark(powSetRje = powerSet(bigX)[-1],
               powSetRich = do.call(c, lapply(seq_along(bigX), combn, x = bigX, simplify = FALSE)),
               powSetArrange = do.call(c, lapply(seq_along(bigX), function(y) arrangements::combinations(bigX, y, layout = "l"))),
               times = 15, unit = "relative")
Unit: relative
          expr      min       lq     mean   median       uq      max neval
     powSetRje 5.539967 4.785415 4.277319 4.387410 3.739593 3.543570    15
    powSetRich 4.994366 4.306784 3.863612 3.932252 3.334708 3.327467    15
 powSetArrange 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    15    15
于 2018-03-31T18:57:40.180 回答
1

不使用任何外部包的矩阵结果解决方案:

store <- lapply(
  seq_along(x), 
  function(i) {
    out <- combn(x, i) 
    N <- NCOL(out)
    length(out) <- length(x) * N
    matrix(out, ncol = N, byrow = TRUE)
})
t(do.call(cbind, store))

     [,1]    [,2]    [,3]   
[1,] "red"   NA      NA     
[2,] "blue"  NA      NA     
[3,] "black" NA      NA     
[4,] "red"   "black" NA     
[5,] "blue"  "blue"  NA     
[6,] "red"   "black" NA     
[7,] "red"   "blue"  "black"
于 2020-04-03T14:25:14.597 回答