r - 组合作为 R 中的元素的类似组合的任务

Question

不确定我是否选择了一个好的标题...而且我不知道我是否使用了正确的术语，所以也许使用正确的搜索字词我会找到解决这个问题的方法...

我有一个字符串列表，我希望从中获得 3 的所有“排他”组合。

示例：使用以下内容

require(utils)
mylist<-c("strA","strB","strC","strD","strE","strF")
t(combn(mylist,3))

我得到一个表格，列出了这 6 个字符串中 3 个的所有可能组合（因此每一行代表 3 个的一个组合）：

        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strA" "strB" "strD"
   [3,] "strA" "strB" "strE"
   [4,] "strA" "strB" "strF"
   [5,] "strA" "strC" "strD"
   [6,] "strA" "strC" "strE"
   [7,] "strA" "strC" "strF"
   [8,] "strA" "strD" "strE"
   [9,] "strA" "strD" "strF"
  [10,] "strA" "strE" "strF"
  [11,] "strB" "strC" "strD"
  [12,] "strB" "strC" "strE"
  [13,] "strB" "strC" "strF"
  [14,] "strB" "strD" "strE"
  [15,] "strB" "strD" "strF"
  [16,] "strB" "strE" "strF"
  [17,] "strC" "strD" "strE"
  [18,] "strC" "strD" "strF"
  [19,] "strC" "strE" "strF"
  [20,] "strD" "strE" "strF"

但我想拥有 3 的所有组合，其中每个字符串只出现一次。所以我想要的输出看起来像这样：

$1
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "srtE" "strF"
$2
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strD"
   [1,] "strC" "strE" "strF"
$3
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strE"
   [1,] "strC" "strD" "strF"
...

所以这里每个子元素（$1, $2,$3等）包含 3 个字符串的 2 种组合（如 2*3=6; 有 6 个字符串）。在每个集合中，每个字符串不得出现超过一次。

mylist当然，如果这对于不是的倍数的长度也是可能的，那就太好了n=3。如果我们假设有 10 个字符串（加上“strG”、“strH”、“strI”和“strJ”），我希望在每个组合中省略一个字符串。所以想要的结果就像

$1
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strH" "strI"
$2
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strH" "strJ"
$3
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strI" "strJ"
$4
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strE" "strF" "strG"
   [3,] "strH" "strI" "strJ"
...

有人对此有解决方案吗？如果我的解释不清楚，请告诉我。

干杯

score 1 · Accepted Answer

基于 42 的帮助（再次感谢！），我想出了一个绝非优雅的方法，但可以完成工作（慢慢地......）。但只是因为我可以在执行以下步骤之前消除一些可能的组合，所以这种方式是可行的。在我最初的问题中，我有 49 个字符串，这会导致向量非常大，因此在将以下步骤应用于超过 15 个字符串时要小心。肯定有一种方法可以计算必须处理多少组合......

这是完整的例子

require(utils)
mylist<-paste("str",LETTERS[1:10],sep="")
mat<-as.data.frame(t(combn(mylist, 3, simplify = TRUE)))
mat[] <- lapply(mat, as.character)

mat.subset<-list()
for (i in seq(nrow(mat)))
{
  mat.temp<-mat
  j=1
  mat.subset[[i]]<-mat[i,]
  rem.row<-sort(unique(c(which(mat.temp[,1]%in%mat[i,1:3]),which(mat.temp[,2]%in%mat[i,1:3]),which(mat.temp[,3]%in%mat[i,1:3]))))
  mat.temp<-mat.temp[-rem.row,]
  while (j<=nrow(mat.temp))
  {
    if(!length(intersect(mat.temp[j,1:3],unlist(mat.subset[[i]]))))
    {
      mat.subset[[i]]<-rbind(mat.subset[[i]],mat.temp[j,])
      rem.row<-sort(unique(c(which(mat.temp[,1]%in%mat.temp[j,1:3]),which(mat.temp[,2]%in%mat[i,1:3]),which(mat.temp[,3]%in%mat[i,1:3]))))
      mat.temp<-mat.temp[-rem.row,]
    }
    j<-j+1
  }
}
mat.subset.lengths<-unlist(lapply(mat.subset,function(x) nrow(x)))
mat.subset<-mat.subset[which(mat.subset.lengths==max(mat.subset.lengths))]

最后两个步骤在我的情况下是必要的，因为如上所述，在耗时的for循环之前排除了一些组合，并且只有一定数量的起点会产生完整的解决方案（或者在最坏的情况下接近完整的解决方案） .

如果您有一个提示，说明此程序所涵盖的套数更多，或者您有更优雅的方式，我将不胜感激您的意见。

score 1 · Accepted Answer

将假设转置的组合矩阵命名为mat。检查是否与应用于intersect函数结果的长度有任何重叠：

 res <- list();
 for (i in 1:nrow(mat) ){
    for( j in 1:nrow(mat)){  
          if( !length(intersect(mat[i,] , mat[j,])) ) 
               res[[paste(i,j,sep="_")]] <- rbind( mat[i,], mat[j, ]) } }


> res
$`1_20`
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"

$`2_19`
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strD"
[2,] "strC" "strE" "strF"

$`3_18`
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strE"
[2,] "strC" "strD" "strF"

.... snipped

根据您对“唯一”的定义，您可能决定只取前十个项目，因为其中一半是行的转置：

> res[[1]]
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
> res[[20]]
     [,1]   [,2]   [,3]  
[1,] "strD" "strE" "strF"
[2,] "strA" "strB" "strC"

r - 组合作为 R 中的元素的类似组合的任务

2 回答 2

Related

Reference