2

下面是一个函数,它创建将 x 的元素分成 n 组的所有可能组合(所有组具有相同数量的元素)

功能:

perm.groups <- function(x,n){
    nx <- length(x)
    ning <- nx/n

    group1 <- 
      rbind(
        matrix(rep(x[1],choose(nx-1,ning-1)),nrow=1),
        combn(x[-1],ning-1)
      )
    ng <- ncol(group1)

    if(n > 2){
      out <- vector('list',ng)

      for(i in seq_len(ng)){
        other <- perm.groups(setdiff(x,group1[,i]),n=n-1)
        out[[i]] <- lapply(seq_along(other),
                       function(j) cbind(group1[,i],other[[j]])
                    )
      }
    out <- unlist(out,recursive=FALSE)
    } else {
      other <- lapply(seq_len(ng),function(i) 
                  matrix(setdiff(x,group1[,i]),ncol=1)
                )
      out <- lapply(seq_len(ng),
                    function(i) cbind(group1[,i],other[[i]])
              )
    }
    out    
}

伪代码(解释)

nb = number of groups
ning = number of elements in every group
if(nb == 2)
   1. take first element, and add it to every possible 
       combination of ning-1 elements of x[-1] 
   2. make the difference for each group defined in step 1 and x 
       to get the related second group
   3. combine the groups from step 2 with the related groups from step 1

if(nb > 2)
   1. take first element, and add it to every possible 
       combination of ning-1 elements of x[-1] 
   2. to define the other groups belonging to the first groups obtained like this, 
       apply the algorithm on the other elements of x, but for nb-1 groups
   3. combine all possible other groups from step 2 
       with the related first groups from step 1

这个函数(和伪代码)最初是由 Joris Meys 在上一篇文章中创建的: Find all possible ways to split a list of elements into a given number of groups of the same size

有没有办法创建一个返回给定数量的随机可能组合的函数?这样的函数将采用第三个参数,即 percent.possibilities 或 number.possiblities 固定函数返回的随机不同组合的数量。

就像是:

new.perm.groups(x=1:12,n=3,number.possiblities=50)

4

1 回答 1

2

根据@JackManey 的建议,您可以使用等概率的方式对一个排列组进行采样

sample.perm.group <- function(ning, ngrp)
{
    if( ngrp==1 ) return(seq_len(ning))

    g1 <- 1+sample(ning*ngrp-1, size=ning-1)

    g1 <- c(1, g1[order(g1)])

    remaining <- seq_len(ning*ngrp)[-g1]

    cbind(g1, matrix(remaining[sample.perm.group(ning, ngrp-1)], nrow=ning), deparse.level=0)
}

其中ning是每组的元素数,ngrp是组数。

它返回索引,因此如果您有任意向量,您可以将其用作排列:

> ind <- sample.perm.group(3,3)
> ind
     [,1] [,2] [,3]
[1,]    1    2    5
[2,]    3    7    6
[3,]    4    8    9
> LETTERS[1:9][ind]
[1] "A" "C" "D" "B" "G" "H" "E" "F" "I"

要生成大小为 N 的排列样本,您有两种选择: 如果您允许重复,即带有替换的样本,您所要做的就是运行前面的函数 N 次。OTOH,如果您的样本要在没有替换的情况下进行,那么您可以使用拒绝机制:

sample.perm.groups <- function(ning, ngrp, N)
{
    result <- list(sample.perm.group(ning, ngrp))

    for( i in seq_len(N-1) )
    {
        repeat
        {
            y <- sample.perm.group(ning, ngrp)

            if( all(vapply(result, function(x)any(x!=y), logical(1))) ) break
        }

        result[[i+1]] <- y
    }

    result
}

这显然是一种等概率抽样设计,而且不太可能是低效的,因为可能组合的数量通常远大于 N。

于 2013-04-01T22:08:33.943 回答