r - 修改后的引导

Question

我有兴趣开发一种修改后的引导程序，该引导程序可以对长度为 x 的向量进行替换，但在停止采样之前必须满足许多标准。我试图计算种群增长率的 lambda 的置信区间，10000 次迭代，但在某些个体分组中，比如向量 13，很少有个体从组中生长出来。典型的自举会导致相当数量的实例，其中该向量不会发生增长，因此模型会崩溃。每个向量由一定数量的 1、2 和 3 组成，其中 1 表示留在一个组中，2 表示从一个组中生长出来，3 表示死亡。这是我到目前为止没有修改的内容，这可能不是最好的方法时间明智，但我是 R 新手。

st13 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,  
          1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,3,3)
#runs
n <- 10000
stage <- st13
stagestay <- vector()
stagemoved <- vector()
stagedead <- vector()
for(i in 1:n){
      index <- sample(stage, replace=T)
      stay <- ((length(index[index==1]))/(length(index)))
      moved <- ((length(index[index==2]))/(length(index)))
      stagestay <- rbind(stagestay,stay)
      stagemoved <- rbind(stagemoved,moved)
}

目前，这个样本我的问题是：我可以通过什么方式修改样本函数以继续对这些数字进行采样，直到“索引”的长度至少与 st13 相同，并且直到至少有 1 个 2 的实例出现在“指数”？

非常感谢 Kristopher Hennig 密西西比大学牛津大学硕士生，MS，38677

score 1 · Accepted Answer

更新： @lselzer 的回答提醒我，要求样本的长度至少与st13. 我上面的代码只是不断采样，直到找到一个包含2. @lselzer 的代码会增加样本，一次增加 1 个新索引，直到样本包含2. 这是非常低效的，因为您可能需要sample()多次调用才能获得2. 2在示例中返回a 之前，我的代码可能会重复很长时间。那么我们还能做得更好吗？

一种方法是使用对sample(). 检查哪些是s 并查看第一个条目中2是否有 a 。如果有，则返回这些条目，如果没有，则在大样本中找到第一个并返回所有条目，包括那个。如果没有s，则添加另一个大样本并重复。这是一些代码：2length(st13)22

#runs
n <- 100 #00
stage <- st13
stagedead <- stagemoved <- stagestay <- Size <- vector()
sampSize <- 100 * (len <- length(stage)) ## sample size to try
for(i in seq_len(n)){
    ## take a large sample
    samp <- sample(stage, size = sampSize, replace = TRUE)
    ## check if there are any `2`s and which they are
    ## and if no 2s expand the sample
    while(length((twos <- which(samp == 2))) < 1) {
        samp <- c(samp, sample(stage, size = sampSize, replace = TRUE))
    }
    ## now we have a sample containing at least one 2
    ## so set index to the required set of elements
    if((min.two <- min(twos)) <= len) {
        index <- samp[seq_len(len)]
    } else {
        index <- samp[seq_len(min.two)]
    }
    stay <- length(index[index==1]) / length(index)
    moved <- length(index[index==2]) / length(index)
    stagestay[i] <- stay
    stagemoved[i] <- moved
    Size[i] <- length(index)
}

这是一个非常退化的向量，46 个条目中只有一个 2：

R> st14 <- sample(c(rep(1, 45), 2))
R> st14
 [1] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[39] 1 1 1 1 1 1 1 1

如果我在它上面使用上面的循环而不是，我会得到以下对于在 100 次运行中的每一次运行st13中获得 a 所需的最小样本量：2

R> Size
  [1]  65  46  46  46  75  46  46  57  46 106  46  46  46  66  46  46  46  46
 [19]  46  46  46  46  46 279  52  46  63  70  46  46  90 107  46  46  46  87
 [37] 130  46  46  46  46  46  46  60  46 167  46  46  46  71  77  46  46  84
 [55]  58  90 112  52  46  53  85  46  59 302 108  46  46  46  46  46 174  46
 [73] 165 103  46 110  46  80  46 166  46  46  46  65  46  46  46 286  71  46
 [91] 131  61  46  46 141  46  46  53  47  83

所以这表明sampSize我选择的 ( 100 * length(stage)) 在这里有点矫枉过正，但由于我们使用的所有运算符都是矢量化的，我们可能不会因为初始样本量过长而受到太大的惩罚，而且我们当然不会产生任何额外的sample()电话。

原文： 如果我理解正确，问题是它sample()可能根本不会返回任何2指标。如果是这样，我们可以继续采样，直到它使用repeat控制流构造。

我已经相应地更改了您的代码，并对其进行了一些优化，因为您永远不会像以前那样在循环中增长对象。还有其他可以改进的方法，但我现在会坚持使用循环。解释如下。

st13 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,  
          1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,3,3)
#runs
n <- 10000
stage <- st13
stagedead <- stagemoved <- stagestay <- vector()
for(i in seq_len(n)){
    repeat {
        index <- sample(stage, replace = TRUE)
        if(any(index == 2)) {
            break
        }
    }
    stay <- length(index[index==1]) / length(index)
    moved <- length(index[index==2]) / length(index)
    stagestay[i] <- stay
    stagemoved[i] <- moved
}

这是与您的 Q 相关的主要更改：

    repeat {
        index <- sample(stage, replace = TRUE)
        if(any(index == 2)) {
            break
        }
    }

它的作用是重复大括号中包含的代码，直到break触发 a 以使我们跳出repeat循环。所以会发生什么是我们采取引导样本，然后检查是否有任何样本包含 index 2。如果有任何2s ，那么我们就中断并继续当前 for 循环迭代的其余部分。如果样本不包含任何2s，则不会触发中断，我们会再次获取另一个样本。这将发生，直到我们得到一个包含 a 的样本2。

score 0 · Accepted Answer

对于初学者，sample有一个size参数可以用来匹配 st13 的长度。您问题的第二部分可以使用while循环来解决。

st13 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,  
          1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,3,3)
    #runs
    n <- 10000
    stage <- st13
    stagestay <- vector()
    stagemoved <- vector()
    stagedead <- vector()
    for(i in 1:n){
          index <- sample(stage, length(stage), replace=T)
          while(!any(index == 2)) {
            index <- c(index, sample(stage, 1, replace = T))
          }
          stay <- ((length(index[index==1]))/(length(index)))
          moved <- ((length(index[index==2]))/(length(index)))
          stagestay[i] <- stay
          stagemoved[i] <- moved
    }

当我写这篇文章时，Gavin 发布了他的答案，这与我的相似，但我添加了 size 参数以确保 index 至少具有 st13 的长度

r - 修改后的引导

2 回答 2

Related

Reference