r - 如何基于采样生成所有可能的向量而不在 R 中替换？

Question

我有一个由七个数字组成的池。我想生成所有长度为 7 的向量：

两个第一个元素是从 7 个数字的池中提取的。
从剩下的 5 个数字中抽取两个下一个元素。
从剩下的 3 个数字中抽取三个最终元素。

这种方式可以用向量 c(2,2,3) 来描述。

For example:
sample <- c(8.93,9.11,9.12,9.05,8.87,8.95,9.02)
structure <- c(2,2,3)

我知道有 7C2*5C2*3C3 = 210 个这样的向量。更清楚地说，我不需要在每组元素内进行排列，例如，两个向量对我c(8.93,9.11,9.12,9.05,8.87,8.95,9.02)来说c(9.11,8.93,9.12,9.05,8.87,8.95,9.02)是相同的，我只需要其中一个出现在 210 个向量的列表中。

这是我使用for循环所做的，combn并且setdiff. 但是，我想structure在代码中使用向量并使其更灵活，例如c(2,5)代替c(2,2,3). 有没有更简洁的解决方案来概括我的问题，apply例如函数族？

df<-data.frame()
sample <- c(8.93,9.11,9.12,9.05,8.87,8.95,9.02)
combn(sample,2) -> com1
for (i in 1:ncol(com1)){
    com1[,i]
    setdiff(sample,com1[,i]) -> com2
    combn(com2,2) -> com3
    for (j in 1:ncol(com3)){
    setdiff(com2,com3[,j]) -> com4
    c(com1[,i],com3[,j],com4) -> de
    df <- rbind(df,de)
    }
}
df

score 0 · Accepted Answer

基础 R 中的递归版本：

x <- c(8.93,9.11,9.12,9.05,8.87,8.95,9.02)
k <- c(2, 2, 3)

f <- function(el, l) {
    if (length(l)==1L) {
        return(data.frame(t(el)))
    }

    do.call(rbind, combn(el, l[1L], 
        #using code directly from setdiff for slight speedup and 
        #comparing integers for robustness
        function(s) cbind(data.frame(t(s)), f(el[match(el, s, 0L) == 0L], l[-1L])),
        simplify=FALSE))
}

apply(f(seq_along(x), k), 1L:2L, function(i) x[i])

score 0 · Accepted Answer

find_combns_in_remainders <- function(list_combns_and_remainders, m) {
  unlist(lapply(
    list_combns_and_remainders,
    function(.) combn(x = .$remainder,
                      m = m,
                      FUN = function(combination) 
                        list(combination = c(.$combination, combination),
                             remainder = setdiff(.$remainder, combination)),
                      simplify = FALSE)
  ), recursive = FALSE)
}

Reduce(
  x = structure, 
  f = find_combns_in_remainders, 
  init = list(list(combination = numeric(0), remainder = sample))
)

# [[1]]
# [[1]]$combination
# [1] 8.93 9.11 9.12 9.05 8.87 8.95 9.02
# 
# [[1]]$remainder
# numeric(0)
# 
# 
# [[2]]
# [[2]]$combination
# [1] 8.93 9.11 9.12 8.87 9.05 8.95 9.02
# 
# [[2]]$remainder
# numeric(0)
# 
# 
# [[3]]
# [[3]]$combination
# [1] 8.93 9.11 9.12 8.95 9.05 8.87 9.02
# 
# [[3]]$remainder
# numeric(0)
# 
# 
# ....
# 
# 
# [[208]]
# [[208]]$combination
# [1] 8.95 9.02 9.12 9.05 8.93 9.11 8.87
# 
# [[208]]$remainder
# numeric(0)
# 
# 
# [[209]]
# [[209]]$combination
# [1] 8.95 9.02 9.12 8.87 8.93 9.11 9.05
# 
# [[209]]$remainder
# numeric(0)
# 
# 
# [[210]]
# [[210]]$combination
# [1] 8.95 9.02 9.05 8.87 8.93 9.11 9.12
# 
# [[210]]$remainder
# numeric(0)

score 0 · Accepted Answer

既然你提到combn，setdiff这里有一种可能性：

我们首先创建一个方便的函数，从draw中抽取ndraw样本x并将结果存储在lst.

draw <- function(x, ndraw, lst) {
    unlist(lapply(lst, function(y) {
        lapply(
            combn(setdiff(x, y), ndraw, simplify = F),
            function(z) c(y, z))
    }), recursive = F)
}

然后，我们可以定义一个函数generate_samples来获取与中的条目draw一样多的样本。我添加了一个检查以确保总和等于.xdrawsdrawsx

generate_samples <- function(x, draws) {
    stopifnot(sum(draws) == length(x))
    res <- list(NULL)
    for (i in seq_along(draws)) res <- draw(x, draws[i], res)
    res
}

在您的特定情况下，我们会做

lst <- generate_samples(sample, draws = structure)
#[[1]]
#[1] 8.93 9.11 9.12 9.05 8.87 8.95 9.02
#
#[[2]]
#[1] 8.93 9.11 9.12 8.87 9.05 8.95 9.02
#
#[[3]]
#[1] 8.93 9.11 9.12 8.95 9.05 8.87 9.02
#
#[[4]]
#[1] 8.93 9.11 9.12 9.02 9.05 8.87 8.95
#
#[[5]]
#[1] 8.93 9.11 9.05 8.87 9.12 8.95 9.02
#
#[[6]]
#[1] 8.93 9.11 9.05 8.95 9.12 8.87 9.02
# ....

我们确认这确实会210在输出中产生元素list
```
length(lst)
#[1] 210
```

score 0 · Accepted Answer

这是你需要的吗？在“我想生成所有长度为 7 的向量”的问题中感觉像是一个矛盾，但随后又说你只需要 2 个 eg 中的一个。使用combn你不会以一个随机样本结束吗？

library(combinat)
x1 <- permn(sample[1:2])
x2 <- permn(sample[3:4])
x3 <- permn(sample[5:7])

all <- expand.grid(x1, x2, x3)
apply(all, 1, unlist)

r - 如何基于采样生成所有可能的向量而不在 R 中替换？

4 回答 4

Related

Reference