r - 使用样本列表作为模板从更大的列表中进行采样，而无需环绕

Question

如果我有一个字母向量：

> all <- letters
> all
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

然后我从字母中定义一个参考样本，如下所示：

> refSample <- c("j","l","m","s")

其中元素之间的间距为 2（第 1 到第 2）、1（第 2 到第 3）和 6（第 3 到第 4），然后我如何从中选择n 个样本，这些样本all在其元素之间具有相同的非环绕间距refSample? 例如，"a","c","d","j"and"q" "s" "t" "z"将是有效样本，但"a","c","d","k"and"r" "t" "u" "a" 不会。前者在第 3 个和最后一个元素之间的索引差异为 7（而不是 6），而后者具有正确的间距但会环绕。

其次，我如何参数化它，以便无论refSample使用什么，我都可以将它的间距用作模板？

score 3 · Accepted Answer

这里有一个简单的方法——

all <- letters                                                                                                                                                                                                                                                                
refSample <- c("j","l","m","s")                                                                                                                                                                                                                                               


pick_matches <- function(n, ref, full) {                                                                                                                                                                                                                                      
  iref <- match(ref,full)                                                                                                                                                                                                                                                     
  spaces <- diff(iref)                                                                                                                                                                                                                                                        
  tot_space <- sum(spaces)                                                                                                                                                                                                                                                    
  max_start <- length(full)  - tot_space                                                                                                                                                                                                                                      
  starts <- sample(1:max_start, n, replace = TRUE)                                                                                                                                                                                                                            
  return( sapply( starts, function(s) full[ cumsum(c(s, spaces)) ] ) )                                                                                                                                                                                                        
}                                                                                                                                                                                                                                                                             

> set.seed(1)                                                                                                                                                                                                                                                                
> pick_matches(5, refSample, all) # each COLUMN is a desired sample vector                                                                                                                                                                                                                                         
      [,1] [,2] [,3] [,4] [,5]                                                                                                                                                                                                                                                
 [1,] "e"  "g"  "j"  "p"  "d"                                                                                                                                                                                                                                                 
 [2,] "g"  "i"  "l"  "r"  "f"                                                                                                                                                                                                                                                 
 [3,] "h"  "j"  "m"  "s"  "g"                                                                                                                                                                                                                                                 
 [4,] "n"  "p"  "s"  "y"  "m"

r - 使用样本列表作为模板从更大的列表中进行采样，而无需环绕

1 回答 1

Related

Reference